Research:Quarry

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Quarry-logo.svg

A screenshot of the Quarry SQL writing interface after execution completed.  The results set appears directly beneath the query writing pane.
Query execution. A screenshot of the Quarry SQL writing interface after execution completed. The results set appears directly beneath the query writing pane.

Quarry is a public querying interface for Labs DB, a set of live replica SQL databases of public Wikimedia Wikis. Quarry is designed to make running queries against Labs DB easy. Quarry also provides a means for researchers to share and review each other's queries. Users of Quarry are required to agree to Labs Terms of use.

quarry.wmflabs.org

Writing & executing SQL queries[edit]

Choosing a database[edit]

By default Quarry queries the public database of English Wikipedia ("enwiki_p"). You can use the database of any other wikisite by prefixing your query with USE <dbname>_p;. For example, the following SQL returns the count of revisions saved by "EpochFail" in metawiki.

USE metawiki_p;
SELECT COUNT(*) 
FROM revision_userindex 
WHERE 
    rev_user_text = "EpochFail" AND 
    rev_timestamp >= "20140101";

The name of the database is usually the language code concatenated with the wiki family (wiki for Wikipedia, wikibooks etc.). For multi-language wikis it's the name of the wiki and wiki, e.g. commonswiki, wikidatawiki. See https://quarry.wmflabs.org/query/278 for the full list.

Available tables and columns[edit]

For an overview of the available tables and columns see MediaWiki database layout. Some extensions add additional tables which can be queried, too, these should be documented on the extension's documentation page. Note that not all tables are available. The following data are missing:

  • The table text containing the actual content of the revisions is missing.
  • Private data is suppressed.

Note that this means that the revision and logging tables do not have indexes on user columns. You can use revision_userindex and logging_userindex instead (see wikitech:Help:Tool_Labs/Database#Tables for revision or logging queries involving user names and IDs)

To get a list of all tables you can run the query (replace enwiki_p with the respective wiki):

SHOW TABLES FROM enwiki_p;

To get the schema of a table, you can use DESCRIBE (e.g. https://quarry.wmflabs.org/query/585)

Downloading a resultset[edit]

Quarry screenshot.download data.png
Download data button. 

Quarry lets you download the result of a query. A data download button appears right before the resultset to get query results in CSV, TSV, JSON or wikitable formats. You can also request resultsets programmatically from wiki tools.

Format
https://quarry.wmflabs.org/run/query_run_ID/output/run_number/<csv|json|json-lines|tsv|wikitable>
Example
https://quarry.wmflabs.org/run/1534/output/0/json
Response
{"headers": ["COUNT(*)"], "rows": [[88]]}

Note that query_run_ID (e.g. /run/10140) it's not the same as query number (e.g. /query/153) which is visible in URL. The query run ID is given in the HTML source code of the query page as "qrun_id".

Wikitable resultset[edit]

Since 2016 Quarry has an option which allows downloading directly query results in wikitable format.

As an alternative you can doawnload query results in CSV or TSV format and then to convert to a table in wikitext markup using the Tool Labs CSV → Wikitable tool, by specifying your resultset in its file query string parameter: ?file=run/query_ID/output/run_number/csv

Example: Top 20 enwiki articles by edits & editors in past 7 days

Replace run id 8473 in the url with the desired run id. The run id is the number in a quarry 'Download data' url. There is also an option to wikilink one or two columns.

Interwiki link[edit]

The interwiki map link quarry: exists to provide an alternate linking means to queries. eg.

See also[edit]