Research:Quarry
Quarry is a public querying interface for Wiki Replicas, a set of live replica SQL databases of public Wikimedia Wikis. Quarry is designed to make running queries against Wiki Replicas easy. Quarry also provides a means for researchers to share and review each other's queries. Users of Quarry are required to agree to Cloud Services Terms of use.
Writing & executing SQL queries[edit]
Choosing a database[edit]
By default Quarry queries the public database of English Wikipedia ("enwiki_p"). You can use the database of any other wikisite by prefixing your query with USE <dbname>_p;
. For example, the following SQL returns the count of revisions saved by "EpochFail" in metawiki.
USE metawiki_p;
SELECT COUNT(*)
FROM revision_userindex
WHERE
rev_actor = (SELECT actor_id
FROM actor
WHERE actor_name = "EpochFail") AND
rev_timestamp >= "20140101";
The name of the database is usually the language code concatenated with the wiki family (wiki
for Wikipedia, wikibooks
etc.). For multi-language wikis it's the name of the wiki and wiki
, e.g. commonswiki
, wikidatawiki
. See https://quarry.wmflabs.org/query/278 for the full list.
Available tables and columns[edit]
For an overview of the available tables and columns see MediaWiki database layout. Some extensions add additional tables which can be queried, too, these should be documented on the extension's documentation page. The Toolforge SQL Optimizer has a database browser that helps to find the needed table and understand its format. Note that not all tables are available. The following data are missing:
- The table
text
containing the actual content of the revisions is missing. - Private data is suppressed.
Note that this means that the revision
and logging
tables do not have indexes on user columns. You can use revision_userindex
and logging_userindex
instead (see wikitech:Help:Toolforge/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs)
To get a list of all tables you can run the query (replace enwiki_p
with the respective wiki):
SHOW TABLES FROM enwiki_p;
To get the schema of a table, you can use DESCRIBE
(e.g. https://quarry.wmflabs.org/query/585)
Downloading a resultset[edit]
Quarry lets you download the result of a query. A data download button appears right before the resultset to get query results in CSV, TSV, JSON or wikitable formats. You can also request resultsets programmatically from wiki tools.
- Format
https://quarry.wmflabs.org/run/query_run_ID/output/run_number/<format>
- Example
https://quarry.wmflabs.org/run/1534/output/0/json
- Response
{"headers": ["COUNT(*)"], "rows": [[88]]}
Note that query_run_ID (e.g. /run/10140) it's not the same as query number (e.g. /query/153) which is visible in URL. The query run ID is given in the HTML source code of the query page as "qrun_id".
Example queries[edit]
- Published Quarry queries are a good source of examples (try using Google site search for specific topics)
- The Toolforge documentation contains various examples
Interwiki link[edit]
The interwiki map link quarry:
exists to provide an alternate linking means to queries. eg.
See also[edit]
- MediaWiki database layout
- Query library -- for ideas of queries to run
- Privacy policy
- Quarry source code
- Quarry bug tracker
- Cloud Services Terms of use
- Toolforge database access documentation
- Grants:Evaluation/Wikiresearch webinars - a series of tutorials on using Quarry and other tools
- Toolforge SQL Optimizer - explains a query execution plan