Research talk:Wikipedia Education Program evaluation

random sample[edit]

Please explain how you created the random sample of users that was used for comparison. John Vandenberg (talk) 21:28, 29 July 2012 (UTC)[reply]

Hi John, unfortunately neither Mani nor Ayush work for WMF anymore, so I'm afraid I'll have to take an educated guess at this rather than having a definitive answer for you. With that caveat, though, they told me they took a random sample of users who also created accounts during September 2011, which is when the students in the student sample also created accounts. -- LiAnna Davis (WMF) (talk) 15:27, 30 July 2012 (UTC)[reply]

Did they include accounts with zero edits? Did they include automatically created SUL accounts? How did they 'random'ly select the accounts? Did they extract a few random samples in order to determine how much variance there is in the results? Does the WMF have a copy of the research data so researchers can try to reproduce the results? John Vandenberg (talk) 22:37, 30 July 2012 (UTC)[reply]

I have similar concerns about the comparison with random new editors. Did this include anonymous edits? If so, a certain proportion of these would be engaging in vandalism, self-promotion, and just not-terribly-good first edits, many of which would not be expected to survive. To compare that random group with a group who are being watched over by a professor and ambassadors (and possibly receiving class credit for the activity) seems somewhat incompatible. At least I think the random sample would need to eliminate anyone not appearing to act in "good faith" at a minimum. Similarly, was every student in the education program a new editor to Wikipedia? I think the data can support the conclusion that given a compulsory task (possibly with credit) and enough hand-holding, people with the educational abilities required to enter college can produce more acceptable Wikipedia content than other new editors without those advantages. However, the apparent lack of retention of these new editors relative to the "random sample" is discouraging. It might be interesting to calculate the "bang for buck" (or "bang for retained bytes") of the two samples. How much dollar value do we put on the time of the professors, ambassadors and the facilities used in any class session etc? Kerry Raymond (talk) 00:00, 31 July 2012 (UTC)[reply]

Still trying to find answers to most of these (apologies again for the unavoidable delay due to staff changes), but here's what I can tell you: Both the random sample of new editors and the sample of student editors contained accounts with zero edits (some students never do the assignment, others lose their password and create another account instead but don't update the course page, etc.). It did not include anonymous, it was definitely two samples of user account names created during September 2011. The point of this research is not to say that the students are better editors, the point is to illustrate that providing assistance and structure for the edits makes new editors more effective in producing content. In terms of what they're actually producing, I encourage you to check out the research on quality we're doing for the Spring 2012 students at w:WP:Ambassadors/Research. I will continue to try to track down the files that Ayush and Mani used, per John's request, but so far those are MIA. :( -- LiAnna Davis (WMF) (talk) 17:46, 2 August 2012 (UTC)[reply]

Another issue is the traffic the edited pages attract. Random users would include contributors to very popular pages, while the students' edits are more likely to be to specialised pages on niche topics that get very few views. Content on little watched pages always stays longer than content on highly watched pages with a high edit turnover. This is quite irrespective of edit quality. Just look at some Wikipedia pages on Indian villages ... their content is crap, with outstanding long-term stability.

So until you also factor page viewing statistics and average page edits per month into your analysis, your variables are hopelessly confounded, and your conclusions nothing but wishful thinking (not to say lying with statistics). Regards. --JN466 15:21, 4 August 2012 (UTC)[reply]

I will be sure to pass along these comments to our research team who will be taking this on in the future. As I said above, I can't speak to the methodology chosen here as I am not a researcher and neither of the two people who designed the work are on staff anymore. But I'm happy to point future researchers to this discussion for thoughts on how to design a better methodology. -- LiAnna Davis (WMF) (talk) 16:17, 8 August 2012 (UTC)[reply]

some answers[edit]

Again, I apologize for the delay -- it's been a challenge to find and interpret files from staff no longer here! Here's what I now have:

creating random sample -- it looks like they assigned each new user a random number between 0 and 1 and then chose the users with the lowest or highest random values.
zero edits -- we don't have any documentation, but I believe it did include zero edit accounts.
SUL accounts -- we don't think so, but it's hard to tell without documentation, of which we have none.
determining variance -- we have no documentation of this begin done, so it is likely to not have been done.
data sets -- we're trying to get our new analytics team member access to dumps.wikimedia.org to put the data sets there. I'll let you know when I hear more.

I will say, though, that I don't think this approach to determining data will be used in the future. Our new team is working on a different approach to the problem, and for Spring 2012, we asked experienced Wikipedians to evaluate article quality of 124 student articles by hand. The results of that study are here: http://en.wikipedia.org/wiki/Wikipedia:Ambassadors/Research/Article_quality/Results

Hope this helps, and again, apologies for the delay. -- LiAnna Davis (WMF) (talk) 21:23, 2 October 2012 (UTC)[reply]

Data sets are up! http://dumps.wikimedia.org/other/wep/ -- LiAnna Davis (WMF) (talk) 21:14, 4 October 2012 (UTC)[reply]