Research:MediaWiki events: a generalized public event datasource
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
Wiki-tool builders & researchers rely on various sources of information about what's happened and is currently happening in Wikipedia. These data sources tend to be structured in differently and contain incomplete or poorly structured information. Some datasources are queryable, but require complexity to "listen" to ongoing events while others are intended to only be used to "listen" to current events. In this project, we'll describe a common structure for public events in MediaWiki that mimics recentchanges
, but also contains historical information. We'll also explore means for implementing this functionality on top of existing datasources and propose changes to infrastructure that would allow us to improve efficiency and completeness of data.
This user has autopatrolled rights on MediaWiki.org. (list) |
link /list of all your own web pages that will help you to find the right place.
Expensive parse function count and easy to copy instructions or options /fit event
Events
[edit]Available datasources
[edit]- API
list=recentchanges
-- Gathers a joined set of revision/logging and does some event metadata parsing- MySQL db
recentchanges
-- Sequences both revision and logging events.revision
-- Revision and page creation events.logging
-- All non-revision and page creation events.- RCStream -- see https://wikitech.wikimedia.org/wiki/RCStream
- IRC Stream -- see Research:Data#IRC_Feeds
- EventLogging -- see mw:Extension:EventLogging
Relevant events
[edit]- RevisionSaved
fields
|
---|
|
- RevisionsDeleted
fields
|
---|
|
- PageCreated
fields
|
---|
|
- PageMoved
fields
|
---|
|
- PageDeleted
fields
|
---|
|
- PageRestored
fields
|
---|
|
- PageProtectionModified
fields
|
---|
|
- UserRegistered
fields
|
---|
|
- UserRenamed
fields
|
---|
|
- UserRightsModified
fields
|
---|
|
- UserBlocked
fields
|
---|
|
- UserUnblocked
fields
|
---|
|
Desired functionality
[edit]Listening
[edit]for event in mw_events.listen(start="20140729000000"):
# do thing with event
if isinstance(event, RevisionSaved):
revision_saved = event
# do thing with revision_saved
elif isinstance(event, RevisionDeleted):
revision_deleted = event
# do thing with revision_deleted
else:
pass
Querying
[edit]events = mw_events.query(start="20140729000000", end="20140731000000", types={RevisionSaved})
for revision_saved in events:
# do thing with revision_saved
Dumps
[edit]events = MWEventReader("event_dump.enwiki.1.json.7z")
for user_registered in mw_event_reader.filter(types={UserRegistered}):
# do thing with user_registered
Relevant bugs
[edit]- T28122 No way to get the ID of a deleted page from deletion logs
- T59084 Store the page_id of the moved page in log_page
- T71005 Add a list=recentchanges result property for title without namespace
Standardization
[edit]- MediaWiki events
-
- consolidates domain knowledge and wiki archaeology
- hides complexity -- produces standardized data structures
- reads from MySQL database and api.php. Extendable to new formats.
- produces JSON
- provides a special Unavailable datatype to flag critical data that is not currently available
Support needed
[edit]- DBA's at the Wikimedia Foundation to explore means of publishing EventLogging infrastructure
- Developers in non-python languages to talk over cross-language API similarities
Ready to create a project page?
See also
[edit]References
[edit]- ↑ Bold text