Research talk:Autoconfirmed article creation trial/Work log/2017-07-26

From Meta, a Wikimedia project coordination wiki

Wednesday, July 26, 2017[edit]

Today I'll be working on instrumentation and operationalization of our hypotheses.

H1: Number of accounts registered per day will not be affected.[edit]

This can be identified from the logging table with log_type="newusers". log_action will be "autocreate" for accounts that are automatically created (e.g. by CentralAuth), and "create" for normal account creations. There's also "create2" for when a logged in user creates an account for someone else, and "byemail" for when a new account is created but it will get its password through email. Looks like the latter is mainly done through WP:Request an account. These finer-grained distinction can also allow us to understand to what extent newly registered accounts who go on to create new articles also have accounts on other wikis.

H2: Proportion of newly registered accounts with non-zero edits in the first 30 days is reduced.[edit]

The process of identifying the accounts is described above. We can then use the registration timestamp to identify revisions in the revision and archive tables to count up their edits. Because this hypothesis does not restrict itself to surviving edits, the archive table is necessary to find deleted edits.

H3 and H4: Reaching autoconfirmed status[edit]

These two hypotheses require us to identify accounts that reach autoconfirmed status within 30 days of account registration. One requirement is that the account has made at least ten edits, and it is not restricted to surviving edits. We can therefore utilize the same approach as for H2 to count edits. If an account with age between 4 and 30 days reaches the edit threshold, we store the maximum of four days after registration and the timestamp of their tenth edit.

H5: The proportion of surviving new editors who make an edit in their fifth week is unchanged.[edit]

The description of this hypothesis refers to the definition of surviving new editor, and we propose that a surviving new editor is someone who makes at least one edit during their first week, and return to make at least one edit during their fifth week (so as to cover the thirtieth day of their account's lifespan). Similarly as for H2, this can be accomplished with a combination of the revision and archive tables.

H6: The diversity of participation done by accounts that reach autoconfirmed status in the first 30 days is unchanged.[edit]

We propose to measure this by either looking at the number of pages edited, or the number of namespaces contributed to. There has been some research related to this, for example using machine learning to detect what type of edit was made.[1][2] Applying these types of more advanced edit detection is out of scope for the current project, partly because we expect newly registered accounts to be fairly restricted in the type of contributions they make. This means that we are most likely more interested in where they make their contributions, and related aspects such as "did they start out by creating a new article?"

References[edit]

  1. Peter Kin-Fong Fong and Robert P. Biuk-Aghai. 2010. What did they do? Deriving high-level edit histories in Wikis. In Proceedings of WikiSym. DOI
  2. Yang, D., Halfaker, A., Kraut, R. E., & Hovy, E. H. (2016, March). Who Did What: Editor Role Identification in Wikipedia. In Proceedings of ICWSM (pp. 446-455).