PetScan

From Meta, a Wikimedia project coordination wiki
This page is a translated version of the page PetScan and the translation is 54% complete.
Outdated translations are marked like this.
Coolest Tool Award 2022 logo
PetScan

2022 Coolest Tool
Award Winner

in the category
Reusable

PetScan是一个有力的工具。在PetScan提交表单中准备了一个查询。另外,看看这个工具的基本原理

介绍

PetScan可以生成符合特定条件的维基百科(和相关项目)页面或维基数据项目列表,例如某个类别中的所有页面或具有特定属性的所有项目。 PetScan还可以以各种方式组合一些临时列表(这里称为“来源”),以创建一个新列表。 来源包括:

维基百科/维基媒体项目的页面

这些在类别页面属性模板和链接选项卡中定义。您可以请求分类树中的页面,特定的模板或从/到特定页面的链接;将结果限制在某些命名空间,机器人/人工编辑,最近的编辑/页面创建等。这三个选项卡表示前CatScan2功能。他们的查询结果后来被封为“类别来源”。

其他队列

在此选项卡中,您可以添加更多来源,例如Wikidata SPARQL(WDQS)查询或PagePile列表。您还可以定义如何组合多个来源; 默认情况下,在最终结果中返回子集(即只有在所有源中出现的页面)。您还可以指定希望列表指向哪个wiki,例如你把维基百科和维基数据结合起来。

维基数据API

在此选项卡中,您可以进一步注释或“过滤”您的结果,例如仅返回没有语句的维基数据项。使用任何这些过滤器将您的列表转换为维基数据。

输出

在这里,您可以为列表指定选项,例如格式(网页、wiki、PagePile等)。您还可以进一步过滤结果,例如在页面标题/项目标签上使用正则表达式。您还可以将结果列表替换为缺失主题的排列列表(“重新链接”)。

定义你自己的请求

可以在查询表单中设置的字段如下所示:

领域 含义 默认 注意
语言 选择项目语言代码,例如 “en”代表英文,“de”代表德文。选择“公共”为维基共享资源 "en"
项目 维基媒体项目被搜索(如维基百科、维基词典、维基学院等) "wikipedia"
深度 要搜索的类别树的深度。0表示不使用子类别。 "0"
分类 类别列表,每行一行,不含【分类:】部分。 附加'|'和一个数字将设置此类别树的深度,覆盖深度字段中选择的内容。
负面分类 上面的类别列表。只有不包含在这些类别中的文章才会被接受。
组合 如何使用上述分类:
  • 分类列表:列出子分类
  • 子集:所有分类树中的所有页面
  • 联盟:至少在一个分类树中的所有页面< br /> *差异:只有一个分类树中的所有页面
  • 至少(N):至少在N个分类树中的所有页面

目前可用的选项是“子集”或“联合”。

"subset"
命名空间 用作潜在页面的名称空间 条目
重定向 两者都不赞同
模板

Use only pages that

  • Box 1: contain all of the given templates
  • Box 2: contain one of the given templates
  • Box 3: contain none of the given templates

Enter one template per line, without "template:" prefix. Each box may be qualified by selecting "Use talk page instead"

此选项似乎仅与模板名字空间中定义的模板兼容。它不能与用户名字空间中定义的模板一起使用。它不能用于维基共享资源中的的“Creator:”或“Institution:”名字空间
调用自:
最近更新 显示最后一次编辑是否由机器人制作,由匿名用户制作或被标记的页面 或者,要么是,要么
最后更改 页面上次更改的日期或时间段,格式为YYYYMMDDHHMMSS(允许更短) “只有在上述时间窗口内创建的页面”才允许您查找“第一次更改”
尺寸 文件大小或大小范围(字节) 允许选择文件大于一个截止值和/或小于另一个截止值的文章
链接 页面上内部链接的数量或范围 允许选择包含许多或很少链接的文章
红链
顶级分类 Feature which is not yet available.
排序 Feature which is not yet available, which would set sorting criteria for output.
Manual list Allows providing a list of (namespace prefixed) page names or Wikidata items from specified project

The tricky part is specifying projects the correct codes are:

  • English Wikipedia: enwiki
  • German Wikisource: dewikisource or dewikisourcewiki
  • Greek Wiktionary: elwiktionarywiki
  • English Wikinews: enwikinews
  • Wikidata: wikidatawiki
维基数据 Get Wikidata, if available.
格式化 Output format of the search results:
HTML: webpages
CSV: values in quotation marks, separated by commas
TSV: Tab Separated Values
WIKI: as Wikitable
PHP: as a PHP file
XML: as an XML file
Do it! Hit this to run the submission you have defined.

Know-how

PetScan ID (PSID)

As of 2016-04-04, every query that gets run in PetScan is recorded (anonymously!) and assigned a unique, stable, numeric identifier called PSID. You can use the PSID to

  • run this PetScan query as an input in tools that support PSID (such as WD-FIST)
  • fill in a "short URL": https://petscan.wmflabs.org/?psid=PSID will run the query with PSID, with all its settings
  • expand programmatically on a previous query, by "overwriting" parameters: https://petscan.wmflabs.org/?format=wiki&psid=PSID will run the same query as before, but the output format will be wiki (instead of default HTML, or whatever was chosen originally).

Notes:

  • Only the query will be stored, not its results!
  • Large queries (e.g. with many manual items) will not be stored. In that case, no PSID will be shown.
  • Results with an empty checkbox have possible matches within the Wikidata set.
  • the interwiki link petscan: can be used to generate shortcuts for permanent queries, eg. [[petscan:PSID]]
  • queries recorded are not deduplicated, so a new PSID will be generated each time unless an existing PSID is called without modification.

Create Wikidata items for Wikipedia articles that don't have one yet (Creator functionality)

  • Set up a query that returns a list of Wikipedia (or other, non-Wikidata project) pages, or paste a list into "Other sources/Manual list"
  • Under the "Page properties" tab, you should select "Redirects=No" This is done automatically now; you can change it back if you really want redirects in your list!
  • Under the "Wikidata" tab, select "Only pages without item" for the "Wikidata" option
  • Run query
  • Your results will have additional elements next to the "results" header (unless you are not logged into WiDaR, in which case you will see an appropriate link instead)
  • All pages for which there is no exact match in any label or alias on Wikidata are checked by default.
  • You can check/uncheck boxes manually now, if required.
  • You can add default statements into the statements box, which will be added to all your new items. So, if you only create items for people, add P31:Q5. You can add multiple statements this way (one per line). Do note that the case of P/Q needs to be in upper case – otherwise it will fail quietly.
  • You can add default descriptions to new items, such as Dde:"some description" for a German description.
  • Click the green "Start QS" button. This will open a new page.
  • You can click "Run" to run a batch in your browser, or "Run in background" to run them from a Wikimedia server. See Help:QuickStatements for more information.


Add/remove statements for Wikidata items

It is possible to add or remove statements for Wikidata items with PetScan. For this it is crucial that you choose "Wikidata" in "Other sources -> Use Wiki". Then you will see the command box next to the number and can continue as described in the previous section.

Referrer

(V2 only) If you open PetScan from another tool to let the user create a query, you can pass the referrer_url and referrer_name (defaults to referrer_url) parameters. referrer_url should have a {PSID} string which will be replaced with the PSID the user sees. Once a query was run, a box at the top of the page will prompt the user to return to the original tool, using the PSID-modified referrer_url.

示例

Articles in a WikiProject

A request on the Talk page of this Manual: Find all mainspace articles within "WikiProject UK geography". Starting with a default PetScan submission form, just add "WikiProject UK geography" to the first box of the Templates row, and, just below, select "Use talk pages instead". Here is the query filled out. Hit "Do it!" at bottom. When run on 16 August 2015, the query required 1.5 seconds to run, and yielded a list of 21,408 articles. The list appears BELOW the submission form (which remains on your screen), so you have to scroll down to see the results.

Dablinks within a WikiProject

Editors working on disambiguation seek to enlist members of a content area WikiProject, specifically WikiProject Canada, to help. A PetScan report is designed to find all articles having ambiguous links that are within the given WikiProject. Criteria applied:

  1. Articles having ambiguous links are within "Category:All articles with links needing disambiguation", so paste "All articles with links needing disambiguation" into the PetScan Categories field.
  2. Depth is set arbitrarily to 9, meaning that articles as far as 9 subcategories down from the "needing disambiguation" parent category will be found. (Searching to that depth is not necessary in this case but doesn't hurt.)
  3. Articles within WikiProject Canada have "Template:WikiProject Canada" on their talk pages, so paste "WikiProject Canada" into PetScan's "Has any of these templates" field, and just below select "Use talk pages instead" as a qualifier.
  4. Only regular articles, not disambiguation pages, are wanted, and disambiguation pages are distinguished by having template:disambiguation, so paste "Disambiguation" into PetScan's "Has none of these templates" field, and make sure "Use talk pages instead" is not selected.
  • These criteria are implemented by this PetScan submission form, filled out. To submit the query, select "Do it!" at the bottom.
  • When submitted on 16 August 2015, the query took 31 seconds to run, and results were a list of 255 articles. The results show BELOW the PetScan submission form, which remains in place, so you may see no change on your screen. You have to know to scroll down to find the results! That request was run with default Output format "HTML".
  • To obtain the results in a Wikitable, in order to share them at a subpage of the WikiProject, the request could be revised to select Format "WIKI". This time the results, in wikitable markup, replace the PetScan submission form on your screen.
  • To make a more useful list for disambiguators, set up so that DabSolver will open up on any item clicked, a several step process can be followed. Here the results were saved to Tab-Separated format instead, then brought into Excel, then a column was composed which concatenated simple text strings with the results, then that resulting column was copy-pasted. The results were pasted over to the English language Wikipedia page w:Wikipedia:Canadian Wikipedians' notice board/ArticlesNeedingDisambiguation2015-08-17 and were posted also within a scrolling window in discussion at the WikiProject Canada talk page. --Doncram (talk) 19:50, 24 August 2015 (UTC) link adjusted. DexDor (talk) 06:58, 29 March 2016 (UTC)[reply]

Detecting pages that have an anomalous combination of namespace and category/ies

PetScan can be used to find pages that are in a category (or combination of categories) that is not appropriate for pages in a particular namespace - e.g. Wikipedia administration pages that are in a category that should only contain encyclopedic articles. This can then be fixed (e.g. by moving an article to the correct namespace or by editing a discussion to insert a missing ":" where a category is being referred to). The first step in this process is to identify (using PetScan) categories that cause incorrect categorization (e.g. Wikipedia administration categories that are in article categories).

Find uncategorized photo contributions in Commons in a given language

(Based on Grants:Learning patterns/Treasures or landmines: detecting uncategorized, language-specific uploads in Commons. See the motivation and full explanation there! Thank you to wikimedia user User:Spiritia and other contributors/commenters there for contributing this! )

Run a query using PetScan with the following settings:

Language = commons
Project = wikimedia
Depth = 1
Categories = Media needing categories
Combination = ☑ Subset
Namespaces = ☑ File
Templates : Has all of these templates = <your language code> 
Format:  ☑ Extended data for files     ☑ File usage data

The English language code is "en"; the Romanian language code is "ro". To find uncategorized photos uploaded by users using Romanian language, a version of the query (with html output, and without autorun) is:

https://petscan.wmflabs.org/?language=commons&project=wikimedia&depth=1&categories=Media+needing+categories&ns%5B6%5D=1&templates_yes=ro&ext_image_data=1&file_usage_data=1

As of 15 March 2016, after hitting "run" the query requires about 105 seconds to finish, and yields 1748 uncategorized photos.

Notes:

  1. The "Language =" field is not used to select the desired language; the desired language code is set in the "Template" field instead.
  2. The language code is case-sensitive in the query! So for example use "ro" not "RO".
  3. To generate the results there, Format: ☑ Wiki was chosen, instead of the default output of Html.

Enjoy! Thanks again to User:Spiritia especially!

Items with no statements

The option "Has no statements" can be used to find:

Steps to import the template, some with PetScan.

Get the sitelinks for a certain project from a SPARQL query

  • Indicate the project on the 'Categories' tab. E.g. de for Language and wikipedia in Project to use the German language edition of Wikipedia.
  • In Other sources enter your SPARQL query
  • Make sure to select From categories from the Use wiki options
  • Press Do it

This could be useful to get the pageviews of a specific set of pages, based on a SPARQL query. You can save this to a Pagepile (check the Output tab), then enter that Pagepile ID in Massviews Analysis (select 'Page Pile' from the Source dropdown).

Get a list of Wikidata items with exclusions based on a SPARQL query

Let's say you got a list of people with Wikidata ID's (QIDs) that you want to add an occupation (P106) of 'jewellery designer' (Q2519376) to, maybe with a tool like QuickStatements. However, you don't want to add this occupation to items that already have that occupation. Here's how to do that with PetScan:

  • Have your list of QIDs in a text file, with each QID on a new line
  • In the tab 'Other sources', paste this text into the field called 'Manual list'
  • In the form 'Wiki' enter the string wikidatawiki
  • In the field 'SPARQL' enter your SPARQL query. In this example, this query will give all humans with an occupation of 'jewellery designer':
  • select ?item where { ?item wdt:P31 wd:Q5; wdt:P106 wd:Q2519376. }
  • Finally, you want to make an exclusion, so in the field 'Combination' add the string manual NOT sparql to get all the QIDs from the 'manual list', but without the items from the SPARQL query.
  • Hit 'Do it!'

Add your example here...

Bug reports, feature requests, code base

参见