Research talk:New Page Patrol survey

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Independent research[edit]

I have done some independent statistical research into the profiles of the average new page patroller on the English Wikipedia. The purpose of the research was to validate the survey method. In particular, I believe that invalidating all survey results which didn't include a username may have skewed the results so as to make them inaccurate. I intend to test this theory by comparing the profile of the average enwiki new page patroller to the average profile found in the 309 validated survey results. If the survey results track well with the data below, then the survey methods are likely valid. If they deviate by a large margin, then the survey methods are likely invalid. Since I don't personally have access to the survey results, I will rely on the "Key personnel" identified on the research page to compare the results.

When I originally compiled a list of all users who had patrolled articles, it inadvertently included autopatrols along with manual patrols. The data provided below is for manual patrols only, autopatrols have been filtered out completely. And, in fact, it only includes data from manual patrols performed in the article namespace; patrols in other namespaces have been ignored. Furthermore, these statistics are not filtered to any date range; that is, they span all patrols made throughout the entire implementation of page patrolling in Wikipedia.

Here's what I found:

  • 3482 users have manually patrolled at least 1 article since page patrolling has been available in Wikipedia.
  • Of those, 1901 have patrolled at least 10 articles. The following statistics are for these 1,901 users who have patrolled at least 10 articles.
  • The average patroller has patrolled a total of 361 articles (maximum=24281, minimum=10).
  • The average patroller has an edit count of 14166 (maximum=903410, minimum=18).
  • For the average patroller, it has been 568 days (1.56 years) since they performed their first patrol (maximum=1418 days (3.88 years), minimum=2 days). This means that the average patroller started patrolling on 17 May 2010.
  • For the average patroller, it has been 1382 days (3.78 years) since they performed their first edit (maximum=3650 days (10 years), minimum=18 days). This means that the average patroller started editing on 23 February 2008.
  • The average patroller started editing 813 days (2.23 years) before they started patrolling (maximum=2930 days (8.03 years), minimum=0 days).

I have saved the results of my queries, and I am happy to analyze them further to provide additional statistics, if requested. I'm also happy to share the SQL queries I used to obtain these results. I can also provide a sortable table of the full results, if desired. These queries were made on toolserver's SQL server on 5 December 2011. —SW— chatter 00:32, 6 December 2011 (UTC)

This is excellent, and demonstrates that much of the objective research information can be gathered from straight database queries. In order to establish an accurate NPPer profile, it should be possible from this new date to extract more granular extrapolations, in addition to the averages quoted, for operations over the past 2 years. I suggest looking for:
  • Longevity of users' editing habits and when they gave up editing,
  • Last page patrol of user - i.e. how lo,g did they do it for, and when did they give up patrolling
  •  % of users who began patrolling during the past 2 years
  •  % of patrolls over the past 2 years made by new users registered during the past 2 years
  •  % of users over the past 2 years made by new users registered during the past 2 yearswho editing pattern leans more towards page patrolling than contributions to other parts of Wikipedia
  • Average number of patrollers on duty at any one time (from Snottywong's toolserver tool)
  • Number of patrollers on duty expressed as a graph over the 2 year sample period (from Snottywong's toolserver tool)
  • QUESTION: Have the mathematical extrapolations requested by Kudpung been carried out?
  • QUESTION: Can the statistician's mathematical extrapolations that have been done be published here?
  • NOTE: Not all participants on the MetaWiki research team have been provided with the complete raw data from the survey, or a list of removed returned forms (identifying information not required).

Kudpung 06:04, 7 December 2011 (UTC)


What a mess![edit]

(copying and expanding from a comment at en:Wikipedia talk:New pages patrol/Survey#Research_project)

I am sad to note that this seems to be the work of a groups of unskilled amateurs. And I am not pointing at the proponent, as it was indeed a good idea, but it looks like a mess of a implementation. I made first contact with this while being asked (with a flashy boxed barnstar'ed message, oh why?!) at my talk page to participate on a survey. I protested (maybe too vehemently...) about it, but eventually participated. It was immediately apparent that the survey was all too easy to skew, by responding multiple times (*). Then the results may or may not be published. Now I found out that the target group was badly picked. So someone removes more than half of the likely legitimate contributions, based on a optional field!? What does that leaves us with? A serious case of the 5th largest site in the world - and a 'scientific' one! not YouTube or Facebook, no, a encyclopaedia - apparently being run by incompetent amateurs, and with lack of respect for editors time.

(*)«the 10 year old from Africa with a PhD»!? I understand a '10 y.o. with a PhD' is not credible. I don't get what is Africa doing there. No chance for a 10 y.o. from Africa in WP? No chance for a PhD from Africa? Which? - Nabla 16:11, 8 December 2011 (UTC)

This is the most revealing result of the survey. The '10 year old from Africa with a PhD' is typical of many of the silly answers that were receive from the survey. which alone proves that some of the New Page Patrolling is being done by a group of immature, irresponsible, untrustworthy users. Many of the silly statements in the free-text boxes may show that the number of these patrollers could be as high as 10%. Kudpung 00:32, 9 December 2011 (UTC)
  • Nabla: it is incredibly unlikely that a 10 year old has a PhD. It is also fairly unlikely for a substantial portion of editors to be from Africa; we have general information on geographic bias. The fact that both were the case simply increases the improbability ;p. Responding multiple times would totally be a problem if the survey didn't note IP addresses, allowing us to remove multiple respondents. And yes, the target group was badly picked - the report we're working on does address that, and points out that we compared both sanitised and unsanitised results and found them very, very similar. There is no reason to believe the sanitisation compromised the data.
    Re Kudpung; I would advise maybe not making pronouncements like that. Let us say that we assume 10 percent of responses in free text boxes were silly (which I don't, for a second, believe). That 10 percent could be "10 percent of users are immature", or it could be "some users are immature, and some didn't understand the question" (which I noted as a recurring problem) or it could be "some users are perfectly willing to have a bit of a laugh when filling out private and anonymised surveys", which hardly translates to "they're willing to actively act immaturely when publicly visible". Okeyes (WMF) 14:24, 9 December 2011 (UTC)
  • "...we compared both sanitised and unsanitised results and found them very, very similar." Then why sanitize the list at all? It only provides fodder to attack the survey and claim that it is invalid because you threw away over 75% of the survey responses. Certainly, some of the results had to be thrown away, like the 10 year old PhD, but throwing away every survey result who didn't fill in an optional field (which includes my survey result) seems rather harsh and unnecessary, and only encourages editors to not participate in future WMF surveys. Also, the fact that the sanitized and unsantized results were very similar suggests that the sanitation was unnecessary; that is, the vast majority of the results you threw away were actually comprised of good information. —SW— comment 15:28, 9 December 2011 (UTC)
    Not...really; let me be clear. The purpose of the sanitisation was to ensure we had no people below the patrolling threshold; in other words, to remove "editors generally" from the survey results, leaving us with just "patrollers" After sanitising the data, I compared the demographic questions (YOB, gender, geographic location, etc) for both the sanitised and unsanitised data. The result was that both are very, very similar; survey respondents generally are not particularly different, demographically, from those we can confirm to be patrollers.
    The problem with saying "well, this means the sanitisation was pointless, then" is that both groups are also very similar, demographically, to editors in general. New Page Patrollers have a similar median age, similar gender balance and so on to other editors. This means that users in the unsanitised group producing similar results to those in the sanitised group does not mean they are definitely new page patrollers - it just means they're definitely editors. Okeyes (WMF) 16:18, 9 December 2011 (UTC)
Got it about Africa, but it really sounds (sounded) bad. No problem. I am no statistician, but I like it, and you reasoning makes sense. Namely, making a 'trustworthy' subgroup was useful as it allowed comparing with the larger sample. I am not entirely sure what the similar patterns shown there. Lets say we have 3 sets of editors: E, all editors (the "universe"); S, the sample responding to the query (after removing the obvious fakes); P, the Patrollers (for sure, the 'sanitized' group). So that P ⊂ S ⊂ E. You say demographically P is similar to S. Do you have data on E (editors in general)? The point is, if S similar to E, then the 3 sets are similar, and thus we know patrollers are 'regular' editors, but we don't know if S is mostly patrollers or not; OTOH if S is not similar to E, then it is likely (but not sure) that S is mostly patrollers, given it is similar to P (In a short and free notation - we know P~S. S~E => P~S~E => S is composed of editors, not necessarily patrollers, thus you must discard that data; S<>E => P~S<>E => S *may* be mostly patrollers, you may look at that data, even if with caution. I hope I was not hopelessly confusing :-)
«Responding multiple times would totally be a problem if the survey didn't note IP addresses, allowing us to remove multiple respondents.» I am not entirely sure, but... Some users, like myself, use a dynamic IP assigned by my ISP each time a log in (to them). My ISP is a large company, so I get assigned one out of 2.82.0.0 - 2.83.255.255 (over 65,000 of them). And the opposite problem certainly applies too.
It is a good effort, trying to study WP and its editors. But all in all it looks like it could be done better and look better (looks matter :-). - Nabla 22:28, 9 December 2011 (UTC)
Good points, all :). As said, what we found was that demographically, at least, E, S and P are pretty much the same (variations of between 1 and 3 percent, mostly). And yeah, we're using the April 2011 editor survey for cross-comparisons :). I agree that it needs to be done better in the future, and that it needs to look more professional in the future; this is directly addressed by the WMF report on the subject (which has been drafted, reviewed, redrafted and is just awaiting final checks). Ironholds 14:14, 10 December 2011 (UTC)
Thank you - 2.82.165.161 11:46, 11 December 2011 (UTC)
There was no strict patrolling threshold. Kudpung 06:33, 10 December 2011 (UTC)

Whose report?[edit]

How did this go from a community initiative which the WMF was going to help to a seriously delayed WMF report that the requesters have not been involved in the compiling of? At what point was it decided that Kudpung would not be one of the drafters of the report, and who decided that? WereSpielChequers 18:59, 11 February 2012 (UTC)

Uhm. It didn't? We're producing our own report, because we're relying on the data in-house. I have repeatedly made clear that Kudpung et al are welcome to exercise their right to write their own report too. Okeyes (WMF) 19:02, 11 February 2012 (UTC)
I'm happy to discuss the situation with you at the meetup tomorrow, btw :). Okeyes (WMF) 19:12, 11 February 2012 (UTC)
I don't understand this question. So because the WMF wrote a report, Kudpung can't? This is not a zero sum game. If I were to choose to (i wont, but let's presuppose that I did) write one, would that automatically mean that nobody else could? No, of course not. I'd be willing to bet that I'd even cross-link theirs to mine. The question is fundamentally flawed at its premise. My understanding is that Kudpung has been given every bit of data that can legally be shared, and that's the same data upon which the WMF's analysis was built. So this question of ownership is a bit of a red herring. Philippe (WMF) (talk) 08:21, 21 February 2012 (UTC)
There is obviously no claim to ownership here, but the WMF issued a report that was possibly based on information that was promised by the WMF, patiently awaited by the community for months, and never received although requested as the principle core element of a project designed and initiated by by the community. The next thing we hear is that the WMF has issued a report. and one that is clearly based on information that the project tamp was not made privy to. There is somethinig fundamentally wrong with this form of collaboration, and WSC's question remains unanswered. Kudpung (talk) 14:55, 21 February 2012 (UTC)
I too (quite unsurprisingly) have some misgivings as well. With what information we, the community, have available to us, we couldn't possibly come up with a cogent report. I'm also confused as to why, if this was supposed to be a community initiative, the WMF's report on this was published before the community had access to all the same data. Reading over it, I get the sense that Benjamin Disraeli's remarks are somewhat applicable here, and I'm interested to see where the stats are that led to this conclusion, because I certainly couldn't have extrapolated it (or anything else, for that matter) from the data that I've seen. The Blade of the Northern Lights (話して下さい) 03:51, 23 February 2012 (UTC)
Can I ask precisely what data you have been given? I'm just trying to clear up in my head precisely where everyone is, so I can easily compare what we used to what people had available to them. Okeyes (WMF) (talk) 03:52, 23 February 2012 (UTC)

Thanks to all for the work done designing and implementing this survey!
I have a similar question, perhaps because I am not clear about how research projects currently move from concept to realization. Kudpung is the first contact listed on this page, and initiated the idea. This would lead me to assume that "Kudpung et al" referred to the set of all people working on the survey and its analysis. Did the researchers split into separate teams? How much of the data gathered is public, and where is that part posted? What was the non-public part of the data, and why aren't all researchers working on it and writing about it together? Regards, SJ talk   02:05, 28 February 2012 (UTC)

The community team mainly concerned with the issues surrounding new page patrol, and who have been working on this since October 2010 are: The Blade of the Northern Lights, Scottywong, wereSpielChequers, and myself. In order to address the issues, we proposed ACTRIAL which received community consensus but was rejected by the WMF. On this rejection, the WMF came up with a solution with a working name of NPP Zoom in september 2011. In spite of the disappointment with the rejection of ACTRIAL, we embraced NPP Zoom with enthusiasm and initially collaborated heavily with the WMF on its development. It seemed necessary, based on our empirical findings over the preceding 12 months, to obtain a profile of who, when, and how, self appointed new-page patrollers do their work. In October 2011, I had the idea of conducting a survey among those who have patrolled new pages. I designed the survey which after some discussion by the team was submitted to the WMF for technical and legal support.only. The objectives here were to lend credence to the survey because occasionally people conduct surveys that do not have WMF sanction, and it was thought that an air of officialdom would encourage responses. In addition, we wanted any legal issues examined regarding the security of the survey software to be used, and confidentiality of any information submitted by respondents. The WMF agreed to these terms, and agreed to provide mathematical extrapolations - which we as a team do not have the ability to do - so that we could draft - as a volunteer team - a report for the community. We have not , to date, been provided with this data. The community researchers have not split into different teams. The next thing the community was aware of is that a WMF report has been published based on the results of the survey. There is some consensus among the community team that the WMF summary does not concur in some parts with the hard empirical evidence as gathered by the team who have been monitoring NPP for over a year. WereSpielChequer's original post at the top of this thread remains unanswered. The community team has not been involved in the preparation of the report, or participated in a pre-publication review of it. Some statements issued by the WMF in the discussion page of the Signpost article do not accurately reflect the extent of my collaboration with the WMF, or their collaboration with the volunteer team. Kudpung (talk) 03:15, 28 February 2012 (UTC)
I already provided you with the data. I already offered to provide you with more data, seven times. SJ, I'll address your other question via email. Okeyes (WMF) (talk) 11:18, 28 February 2012 (UTC)
What might be useful, Kudpung, is if you and yours actually told me when you thought there was a specific problem. You say you don't think some bits line up with other empirical data. Well, how about telling me this, and pointing out what, precisely, doesn't line up? Because all I seem to hear is vague requests for data (which are never followed up on) or vague accusations of impropriety. SJ, there is absolutely nothing stopping Kudpung and his team writing a report with the data, which we have given him. The reason he wasn't involved in drafting the Foundation report is because it's the Foundation report - it is not intended to be the authoritative version. Okeyes (WMF) (talk) 11:23, 28 February 2012 (UTC)
Thank you, Oliver. SJ talk   04:13, 4 March 2012 (UTC)
Kudpung, both Maggie and I have ALSO offered to get you data, if you would just spell out what you need. You have so far resisted doing that, for reasons that I don't fully understand. We will get you data, if you tell us what you think you need that you don't have. Philippe (WMF) (talk) 15:58, 28 February 2012 (UTC)
Firstly whilst I agree with much of what Kudpung has said, the "we" who proposed ACTRIAL didn't include me, in fact I was deeply opposed to it. Secondly, re the argument as to who got what data, up to now I've avoided requesting any, for various reasons I don't want a copy of the raw data on my PC. But there were several anomalies which I'd like to probe and I would like to take advantage of the offers above to request some cross tabs. WereSpielChequers (talk) 23:38, 9 March 2012 (UTC)

Data requests[edit]

I'm intrigued at the idea that only 45% of patrollers are working at the front of the queue as my experience is that a lot more than 45% of the unpatrolled new articles are deleted or patrolled there. I would like to see some cross tabulations to test the ideas that Front of queue patrollers spend less time per article, are generally less experienced but generally more active than other patrollers

Cross tabbing "When patrolling new pages, where do you usually do this from?" with "On average, how long does it take you to patrol one new page?" I'm expecting will show that front of queue patrollers are generally spending less time per article.

Cross tabbing "Why do you patrol new pages? Tick all that apply" with "On average, how long does it take you to patrol one new page?" I'm expecting will show that deletionists are generally spending less time per article.

Weighting "Why do you patrol new pages? Tick all that apply" by number of articles patrolled, as opposed to counting each editor as one would be a useful reality check as to how many of the articles are tagged by patrollers with those motivations. Though I'm assuming this is lifetime patrols and will be heavily weighted towards longterm editors who may no longer be so active. So I'd like to have a contrasting report "Why do you patrol new pages? Tick all that apply" weighted by "On average, how many hours a week do you spend patrolling new pages?" My expectation is that this will tell us more about the motivations of editors currently active in NPP.

I'm a little sceptical re the statement "The high percentage of tenured editors shows that new page patrollers are not inexperienced.", Apparently this holds up fairly well when just the highly actives are measured. But was highly active defined by total patrols or current activity? If it was defined by total patrols I would like to see an alternative perspective: Multiplying "When did you first start editing Wikipedia, either anonymously or with an account?" by "On average, how many hours a week do you spend patrolling new pages?" would be a good test of the theory that a substantial proportion of the patrolling is done by the inexperienced. My own experience is that a substantial proportion of tagging errors are by newbie taggers, but I'm not sure what proportion of the total tagging is done incorrectly or by newbies. WereSpielChequers (talk) 23:38, 9 March 2012 (UTC)

Happy to do some multi-variable analysis on this stuff. However, note that I'm in SF until the end of next week (and don't have access to the raw data) and that the priority at the moment is new page triage comms and the article feedback tool. I'm not making any promises that I can get to it soon. Okeyes (WMF) (talk) 23:41, 9 March 2012 (UTC)
Late March would be fine, thanks. WereSpielChequers (talk) 00:16, 10 March 2012 (UTC)
Sorry, I'm not being clear; the priority, even when I get back, is going to be NPT and AFT5. I can't promise "I will totally get to it by the end of March", although it would be ideal, and I would ask you not to make the assumption that I can rearrange my timetable around data requests when there are two mandated software projects to work through. Okeyes (WMF) (talk) 00:22, 10 March 2012 (UTC)
Appreciate March was unrealistic. But if you are taking data requests it would be nice to get them fulfilled within say 5 months. WereSpielChequers (talk) 14:17, 23 July 2012 (UTC)
Hey; terribly sorry for this. Everything sort of kicked off at once and I lost track of this project :(. I'm doing it as we speak. Okeyes (WMF) (talk) 09:37, 26 July 2012 (UTC)
So, I've now done the analysis; I can output it in graphs by both percentage and raw numbers if you want, but as a broad perspective:
  • If we average between non-whole outputs (so, 16-20 hours becomes an average of 18, less than 1 hour becomes 30 minutes, so on):
  • 761 hours of patrolling are done each week;
  • Dividing the group into three tranches by join date (2001-4, 2005-8, 2009-11), the first tranche does 84 hours of work, the second 490.5, and the third, which consists of the "newest" editors (although some of them are 3, now maybe 4 years old) 186.5. The third tranche does fewer than 10 hours a week more than editors from 2006 on their own.
  • Admittedly you're going to get some ambiguities and inaccuracies, insofar as the averaging for non-whole outputs isn't going to be precisely right, but it is consistently incorrect within the dataset. It's worth noting that the hours worked by the 2011 group (74) were substantially boosted by two outlier users - without them, it drops to 26. Indeed, there were only 12 users from 2011 in total. Okeyes (WMF) (talk) 11:44, 26 July 2012 (UTC)
Thanks. My interpretation of that is that the survey was only really covering experienced patrollers, and as my experience is that the most egregious mistakes are made by newbie patrollers I'd conclude that we simply miss out the most problematic patrollers. Looking forward to getting the other cross tabs. WereSpielChequers (talk) 17:16, 29 July 2012 (UTC)
Well, can you think of a way of testing this? :). Ironholds (talk) 20:31, 29 July 2012 (UTC)
If you analyse all patrollers by number of patrols and compare that to number of patrols per respondee then you will easily see if the survey is skewed against people who did a few patrols and gave up. Once you've confirmed that the survey was skewed towards veteran patrollers, there remains my hypothesis that newbie mistakes are made by newbies. If you feel the need to test that I'd suggest looking for a bunch of CSD declines and seeing how experienced the taggers were at the time of the decline. Of course its possible that some veteran patrollers started off a tad heavyhanded and subsequently improved their patrolling - but it has been my experience that the most egregious errors are from new patrollers. By contrast the largest number of errors almost certainly come from veterans, but in my experience they are more likely to commit less egregious errors such as A7 on an article that asserts significance or importance but would be deleted by AFD for lack of notability, WereSpielChequers (talk) 10:50, 31 July 2012 (UTC)