Research talk:Anonymous editor acquisition

From Meta, a Wikimedia project coordination wiki

Relevant research[edit]

Explaining Quality in Internet Collective Goods: Zealots and Good Samaritans in the Case of Wikipedia http://web.mit.edu/iandeseminar/Papers/Fall2005/anthony.pdf "We find that, for users who create an online persona through a registered user name, the quality of contributions increases as the number of contributions increase, consistent with the idea of experts motivated by reputation and committed to the Wikipedia community. Unexpectedly, however, we find the highest quality contributions come from the vast numbers of anonymous “Good Samaritans” who contribute infrequently. Our findings that Good Samaritans as well as committed “Zealots” contribute high quality content to Wikipedia suggest that open source production is remarkable as much for its organizational as its technological innovation that enables vast numbers of anonymous one-time contributors to create high quality, essentially public goods" Gabrielm199 (talk) 04:04, 25 September 2013 (UTC)[reply]

This paper from Darthmouth is another one. Steven Walling (WMF) • talk 06:58, 26 September 2013 (UTC)[reply]

Resources for "Who writes Wikipedia? (by Aaron Swartz)" [1] A summary of interesting works covering anonymous editors.--GlimmerPhoenix (talk) 13:14, 26 September 2013 (UTC)[reply]

One of the important aspects of Aaron's work (which still applies) is the distinction between content added and number of revisions: "He replicated Wales’ claims about edits, but found that counting characters, the vast majority of major contributors are unregistered and that most have only made a handful of contributions to Wikipedia." Superm401 | Talk 23:43, 26 September 2013 (UTC)[reply]

Some comments on methodology[edit]

Most probably, you have already thought about the following challenges, but I would like to remind them just in case:

  1. IPs cannot be matched to individuals, in any case. I think this is the main reason why anonymous edits have been discarded from most of the previous Wikipedia studies. In fact, researchers have been repeteadly instructed to avoid using anonymous revisions data in their studies.
  2. Quantifying the anonymous group by counting unique IPs can be quite misleading. NAT devices, proxies and other usual networking "tricks" can assign many outgoing connections to the same public IP (companies, large organizations, colleges... in general all variants of campus networks). There are also ISPs that assign IPs to domestic end users on the fly, so it is not warranted that the IP address corresponding to a certain real editor one day is still used by the same editor the day after. This may affect the recruitment of respondents for questionnaires, as an input for qualitative analyses.

--GlimmerPhoenix (talk) 13:14, 26 September 2013 (UTC)[reply]

Thanks, these notes are helpful. Your point about the IP <-> individual mapping is important, and definitely one of the reasons this hasn't been done in the past. I was also wondering who instructed researchers not to use anonymous revisions data. Superm401 | Talk 23:46, 26 September 2013 (UTC)[reply]
In general, previous studies focused on measuring editorial effort (edits/user, number of editors, etc.), types of contributions, roles, behavioral changes, network analysis, gender imbalance and so forth. Therefore, the only way to obtain accurate results in those cases is to use information about editors who can be identified unambiguously (logged users), and to filter bots, since you are usually interested only in human editors (except for studies explicitly focused on bots, e.g. by S. Geiger et al.). Of course, for aggregated statistics anonymous edits information is useful (proportion of anonymous vs. logged users contributions). However, in the former cases (the most frequent) there is an inherent methodological problem if you include anonymous edits in the study.--GlimmerPhoenix (talk) 08:01, 27 September 2013 (UTC)[reply]

Hints from previous work[edit]

While working on the study about the adoption of flagged revisions in the German, Polish and Russian Wikipedias, I obtained a graph summarizing the evolution of the weekly aggregated number of anonymous edits in different Wikipedia languages (other than English). I know your main focus may be the English Wikipedia, but perhaps information about other languages may be helpful, too.

The vertical scale of each individual graph is automatically adjusted (using the screens option of the zoo library in R) to facilitate the comparison of the shapes and trends. For instance, the peak value for the Russian Wikipedia (top right) is half of the peak value for the German Wikipedia (top left). Thus, the important point here is to compare patterns, not levels.

Evolution anonymous edits in 6 Wikipedias

That said, there are very curious differences. The German and Polish Wikipedias clearly exhibit descending trends in anonymous revisions that, so far, have not been linked with any factor. All other languages present more stabilized trends. From the graphs in Research:Anonymous_edits, I infer that the English Wikipedia is another case of decreasing trend.

However, in 2012 the trends of anonymous edits in German and Polish get stabilized again, though on a lower mean value. To start with, it would be good to check with more recent data whether this is also the case for the English Wikipedia or not. Likewise, perhaps it could be a good idea to compare the slopes in different languages, trying to discover common patterns. --GlimmerPhoenix (talk) 13:14, 26 September 2013 (UTC)[reply]

Scoping[edit]

The lead reads "encouraging current or potential anonymous editors to register accounts" but this looks like a false goal: surely your goal is to increase editing activity? There is no gain for us if the unregistered editor registers and then is as active as it was before. (Some, by the way, edit without an account because they feel more productive that way.) --Nemo 10:08, 27 September 2013 (UTC)[reply]

Yes that's the end goal. We would never run a test of UX changes that didn't also look at edit rates. But we're also going to be primarily focusing on metrics that can tell us whether we're successful at helping more willing/interested people to sign up. As noted in the research questions, we actually have no idea how aware anonymous editors are that they can sign up, what benefits they get, and so on. We can probably safely assume that some edit without an account based on an explicit and well-considered decision not to, and we don't want to hamper or annoy those people. But we really don't know what the distribution is, yet. Steven Walling (WMF) • talk 21:38, 27 September 2013 (UTC)[reply]

New plots for anonymous editor activity.[edit]

Hey folks, I just added some plots of anonymous editor activity to Research:Anonymous_editor_acquisition/Volume describing monthly activity of anons and registered accounts in terms of raw revisions and metrics drawn from clustering editing activity into sessions:

  • # of sessions
  • Hours spent editing
  • Mean hours per session
  • Mean revisions per session

My plan is to start filling up sub-pages like this one with analyses based on bold leadins in the research questions section. I'll ping this talk page with a new post as I finish them. --Halfak (WMF) (talk) 21:22, 1 October 2013 (UTC)[reply]

Hey, I also linked the term Volume to that subpage from the main research page. We can also possibly move the research questions there and make the links more prominent, when we're ready. Steven Walling (WMF) • talk 05:29, 2 October 2013 (UTC)[reply]

Anonymous vs Unregistered[edit]

It seems to me that:

"Editor" = anyone who changes any page
"Registered editor" = editor who has one or more accounts
"Unregistered editor" = editor who is not a registered editor
"Anonymous" = "IP" = an edit made by someone who isn't logged-in

Have I got the definition of anonymous right? That is, anonymous edits include both those made by unregistered editors and those by registered editors who aren't logged-in. We can't distinguish them. Nor can we infer the number of unregistered editors from the number of anonymous edits, because:

  1. A single device may change IP address.
  2. A single IP address may be shared by multiple persons, e.g. in a library or institution.
  3. The same person may use multiple devices, e.g. a main device used to make substantial edits plus a secondary or portable device to make ad hoc corrections or reverts.

All three patterns are likely to have changed significantly over the history of Wikipedia, making historical comparisons inherently risky. Aaron's work on "edit sessions" may help, but the bottom line is all data about anonymous editing is ambiguous, and we should not base our reasoning principally on quantitative analysis. When you think about it, the term "anonymous editor" can mean so many different things that it should probably be avoided in serious discussion. But we know from anecdotal evidence that many valuable registered editors made their earliest contributions while unregistered, and the purpose of this research project is to examine ways to encourage currently unregistered editors to register. - Pointillist (talk) 10:03, 26 November 2013 (UTC)[reply]

I'm not sure I see how there's actually a difference between unregistered editor and anonymous or IP editor. They both are identified with IP addresses in edit histories etc., and are thus virtually indistinguishable when looking at a wiki overall. Steven Walling (WMF) • talk 19:33, 27 November 2013 (UTC)[reply]
As we all know, an "anonymous" IP edit can be made by:
  • an unregistered editor,
  • a registered editor who is blocked or banned,
  • a registered editor who has permanently lost their login details, or
  • a registered editor who doesn't want to log in, e.g. because they are editing from an unsafe environment—I know there are admins who adopt this practice.
Are you saying that if you can't distinguish these different scenarios, that means they can be treated as the same? That's rather a cavalier approach to logic! What's the objection to using unambiguous terminology? For example, in Research:Anonymous_editor_acquisition#Research_questions:
  • Volume. How many anonymous unregistered editors are there every month on Wikipedia, and how much do they contribute?
  • Impact. What is the quality of anonymous contributions from unregistered editors? Are their revert rates higher or lower than newly-registered users?
  • Motivation. What keeps anonymous unregistered editors from registering? Some hypothesis to verify:
    • How aware are anonymous unregistered editors of the potential benefits to registering an account?
    • How aware are anonymous unregistered editors of the personal information required for creating an account?
    • How many actively choose to stay anonymous unregistered? How many have registered an account they choose not to use or cannot access?
  • Experience. Unlike logged-in registered users, where we can use edit counts as a proxy for experience, anonymous users who haven't logged-in (esp. those on shared IPs) may be of any experience level.
    • How many anonymous unregistered editors are first-time contributors?
Just by rephrasing the questions, the nature of the research becomes clearer so possible issues in the analysis can be addressed. Do you see what I mean? - Pointillist (talk) 15:40, 2 December 2013 (UTC)[reply]
It's not that we don't acknowledge that these scenarios exist. Since there is no way to technically distinguish between these types just by looking at IP edit histories, the distinction is not something useful. Also, these are almost certainly edge cases. Most IP editors are not logged-out admins or banned registered users. Steven Walling (WMF) • talk 18:40, 2 December 2013 (UTC)[reply]
Though I accept that you may be doing this in good faith – because you genuinely believe that it makes no difference – you are misleading people every time you use "anonymous" as a synonym for "unregistered". It's quite possible that a significant proportion of valuable IP edits are being made by logged-out registered editors: unless you have access to a major source of unpublished data, you can't possibly say otherwise. Because of your WMF role, you have a conflict of interest. Please don't try to build a case based on statistics that can so easily be questioned. It's borderline dishonest and it isn't necessary. I'm sure almost all wikipedia's regular contributors agree that it is a good idea to attract new productive editors anyway. Go for it. - Pointillist (talk) 21:19, 2 December 2013 (UTC)[reply]
We're not using them as synonyms. We're specifically measuring anonymous contributions, and I acknowledge this includes all of the use cases you outlined. This is slightly grey, but that's okay. Data not having the fidelity we want with IP editors is just a fact of life, considering the circumstances. Steven Walling (WMF) • talk 00:41, 3 December 2013 (UTC)[reply]
So are you and Aaron OK if I correct the terminology at places like Research:Anonymous_editor_acquisition#Research_questions? - Pointillist (talk) 13:59, 3 December 2013 (UTC)[reply]