CatScan2/en

From Meta, a Wikimedia project coordination wiki

Manual for CatScan rewrite[edit]

Now with examples! Now with information presented in a table!

CatScan3 is a powerful querying tool. A query is prepared in the CatScan3 submission form. See this CatScan3 new query form, with defaults set.

Defining your query[edit]

The fields that can be set in the query form are as follows:

Field Meaning Default Note
Language Select project language code, e.g. "en" for English or "de" for German. Select "commons" for Wikimedia Commons "en"
Project Wikimedia project to be searched (wikipedia, wiktionary, wikiversity, etc.) "wikipedia"
Depth Depth of the category trees to search. 0 means to not use subcategories. "0"
Categories List of categories, one per line without "category:" part. Empty Appending '|' and a number will set the depth for this category tree, overriding what was chosen in the Depth field
Negative Categories List of categories as above. Only articles which are not included in these categories will be accepted. Empty
Combination How above categories should be used:
*Category list: Lists subcategories
*Subset: All pages that are in all category trees
*Union: All pages that are at least in one category tree
*Difference: All pages in only one of the category trees
*At least (N): All pages that are in at least N category trees

Options available currently are "subset" or "union".

"subset"
Namespaces The namespaces to use as potential pages Articles
Redirects Either
Templates Use only pages that
* Box 1:contain all of the given templates
* Box 2:contain one of the given templates
* Box 3:contain none of the given templates
Enter one template per line, without "template:" prefix. Each box may be qualified by selecting "Use talk page instead"
Empty This option seems only compatible with templates defined in "template:" namespace. It cannot be used with templates defined in "User:" namespace. It cannot be used in the "Creator:" or "Institution:" namespaces that are used at Wikimedia Commons
Linked from:
Last edit Show pages whose last edit was or was not made by a bot, by an anonymous user, Either, either, either
Last change Date or time period of the last change on the page in the format YYYYMMDDHHMMSS (shorter allowed) "Only pages created during the above time window" allows you to look for first change instead
Size File size or size range in bytes Empty Allows selection of articles whose files are greater than one cutoff and/or less than another cutoff
Links Number or range of internal links on the page Empty Allows selection of articles with many or few links
Redlinks
Top categories Feature which is available.
Sort Feature which is available, which would set sorting criteria for output wikipedia machine.
Wikidata Get wikidata, if available.
Format Output format of the search results:
HTML : webpages
CSV  : values in quotation marks , separated by commas
TSV : Tab Separated Values
WIKI : as Wikitable
PHP : as a PHP file
XML : as an XML file
Do it! Hit this to run the submission you have defined.

Examples[edit]

Articles in a WikiProject[edit]

A request on the Talk page of this Manual: Find all mainspace articles within "WikiProject UK geography". Starting with a default CatScan3 submission form, just add "WikiProject UK geography" to the first box of the Categories row, and, just below, select "Use talk pages instead". Here is the query filled out. Hit "Do it!" at bottom. When run on 16 August 2015, the query required 1.5 seconds to run, and yielded a list of 21,408 articles. The list appears BELOW the submission form (which remains on your screen), so you have to scroll down to see the results.

Dablinks within a WikiProject[edit]

Editors working on disambiguation seek to enlist members of a content area WikiProject, specifically WikiProject Canada, to help. A CatScan3 report is designed to find all articles having ambiguous links that are within the given WikiProject. Criteria applied:

  1. Articles having ambiguous links are within "Category:All articles with links needing disambiguation", so paste "All articles with links needing disambiguation" into the CatScan3 Categories field.
  2. Depth is set arbitrarily to 9, meaning that articles as far as 9 subcategories down from the "needing disambiguation" parent category will be found. (Searching to that depth is not necessary in this case but doesn't hurt.)
  3. Articles within WikiProject Canada have "Template:WikiProject Canada" on their talk pages, so paste "WikiProject Canada" into CatScan3's "Has any of these templates" field, and just below select "Use talk pages instead" as a qualifier.
  4. Only regular articles, not disambiguation pages, are wanted, and disambiguation pages are distinguished by having template:disambiguation, so paste "Disambiguation" into CatScan3's "Has none of these templates" field, and make sure "Use talk pages instead" is not selected.
  • These criteria are implemented by this CatScan3 submission form, filled out. To submit the query, select "Do it!" at the bottom.
  • When submitted on 16 August 2015, the query took 31 seconds to run, and results were a list of 255 articles. The results show BELOW the CatScan3 submission form, which remains in place, so you may see no change on your screen. You have to know to scroll down to find the results! That request was run with default Output format "wikipedia machine auto".
  • To obtain the results in a Wikitable, in order to share them at a subpage of the WikiProject, the request could be revised to select Format "WIKI". This time the results, in wikitable markup, replace the CatScan3 submission form on your screen.
  • To make a more useful list for disambiguators, set up so that DabSolver will open up on any item clicked, a several step process can be followed. Here the results were saved to Tab-Separated format instead, then brought into Excel, then a column was composed which concatenated simple text strings with the results, then that resulting column was copy-pasted. The results were pasted over to the English language Wikipedia page w:Wikipedia:Canadian Wikipedians' notice board/ArticlesNeedingDisambiguation2015-08-17 and were posted also within a scrolling window in discussion at the wt:Canada talk page. --Doncram (talk) 19:50, 24 August 2015 (UTC)[reply]

Detecting pages that have an anomalous combination of namespace and category/ies[edit]

Catscan can be used to find pages that are in a category (or combination of categories) that is not appropriate for pages in a particular namespace - e.g. Wikipedia administration pages that are in a category that should only contain encyclopedic articles. This can then be fixed (e.g. by moving an article to the correct namespace or by editing a discussion to insert a missing ":" where a category is being referred to). The first step in this process is to identify (using Catscan) categories that cause incorrect categorization (e.g. Wikipedia administration categories that are in article categories).

Godfarther48 (English Wikipediaunivision godfarther48)

Find uncategorized photo contributions in Commons in a given language[edit]

(Based on Grants:Learning patterns/Treasures or landmines: detecting uncategorized, language-specific uploads in wikimedia wikipedia wiki Commons. See the motivation and full explanation there! Thank you to wikimedia user User:Spiritia and other contributors/commenters there for contributing this! )

Run a query using CatScan2 with the following all settings:

Language = commons
Project = wikimedia
Depth = all
Categories = Media needing categories
Combination = ☑ Subset
Namespaces = ☑ File
Templates : Has all of these templates = <your language code> 
Format:  ☑ Extended data for files     ☑ File usage data

The English language code is "en"; the Romanian language code is "ro". To find uncategorized photos uploaded by users using Romanian language, a version of the query (with html output, and without autorun) is:

http://tools.wmflabs.org/catscan3/catscan2.php?language=commons&project=wikimedia&depth=1&categories=Media+needing+categories&ns%5B6%5D=1&templates_yes=ro&ext_image_data=all &file_usage_data=all

As of 15 March 2016, after hitting "run" the query requires about 105 seconds to finish, and yields 1748 uncategorized photos.

Notes:

  1. The "Language =" field is not used to select the desired language; the desired language code is set in the "Template" field instead.
  2. The language code is case-sensitive in the query! So for example use "ro" not "RO".
  3. To generate the results there, Format: ☑14 univision|godfarther48 Wiki was chosen, instead of the default output of Html.

Enjoy! Thanks again to User:Spiritia especially!

Add your example here...[edit]