User:Halfak (WMF)/Wikimedia data
Datasources[edit]
Primary[edit]
These datasources are official, well defined, maintained and kept up to date.
Content & contributors | Reading behavior |
---|---|
|
|
Secondary[edit]
- WikiStats (more info)
- A collection of reports generated about Wikimedia Projects (active editors, monthly pageviews, etc.)
- DBPedia (homepage)
- A database of structured data extracted from Wikipedias
- RDF,N-triplets, SPARQL endpoint, Linked Data
- Wikimedia @ DataHub.io (homepage)
- A collection of one-off and partially maintained datasets produced by the Wikimedia Research team.
- Teahouse corpus, Clickstream, wikipedia citations, etc.
- ORES (more info)
- Machine learning as a RESTful service
- Scores revisions by the probability that they are damaging and predicts article quality.
- WikiBrainAPI (homepage)
- Powerful algorithmic processing for Wikipedia
- Semantic relatedness, page rank calculations, etc.
Data processing libraries[edit]
- pywikibot (Monolith)
- mediawiki-utilities (Unix-style)
- Primary datasources: mwapi, mwdb, mwxml, mwtytpes
- Auth: mwoauth
- Data processing: mwreverts, mwsessions, mwpersistence, mwparserfromhell, mwmetrics, mwevents, etc.