We need to understand more about article creation on Wikipedia, both in order to inform our own decisions at the Wikimedia Foundation, and to facilitate decisions and discussion by the community.
For this analysis, we will focus on the top 5 Wikipedias by daily number of articles created: English, Spanish, German, French, and Italian.
RQ 1: At what scale are new articles created, and by whom?
- How many new articles are created per day (including articles that are eventually deleted)?
- How many are created by auto-confirmed users?
- How many are created by non-auto-confirmed users?
RQ 2: Of all the articles created by auto-confirmed users, how many survive for 90 days?
- How many of the articles created each day by auto-confirmed users are deleted within 90 days?
- How many of the articles created each day by auto-confirmed users remain after 90 days?
RQ 3: Of all the articles created by non-auto-confirmed users, how many survive for 90 days?
- How many of the articles created each day by non-auto-confirmed users are deleted within 90 days?
- How many of the articles created each day by non-auto-confirmed users remain after 90 days?
Basically, we want to create new live versions of the following charts from 2011:
Data will be gathered using EventBus.
On Wikipedia, there is a great deal of complexity and nuance to the jargon for various kinds of wiki pages and page creation processes. The following definitions are terms we will use in our research analyses:
- A page is any wiki page, i.e. in all namespaces.
- An article is a page in the main namespace, aka namespace zero, that is not a redirect.
- Creation refers to initial creation and does not include undeletion or moving a page to a new title.
- Deletion refers to actual page deletion (within 90 days of creation) and does not include moving to another title or namespace or turning into a redirect.
Assumptions and notes
- Article creation
- We will not be counting pages that are initially created in other namespaces and later moved to the main namespace (for the sake of simplicity).
- Article creator
- We will assume that the user who saved the first revision to a page is the page's creator.
- Exclusion of redirects
- We will only be excluding articles that are initially created as redirects, not articles that are later turned into redirects.