Research talk:Quality of PPI editor work

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

D, I just thought of something: Make sure to show graphs that show how you're matching the distributions of both sets of data. Zackexley 19:55, 17 May 2011 (UTC)

I'd like to see this as well to confirm that the match is as good as we hope. Just a couple histograms over matching variables for both samples would do. --EpochFail 19:09, 26 May 2011 (UTC)

Significantly better?[edit]

Hi Diederik, I was a little surprised at the result. I was wondering whether there was a difference between the PPI editors and the control sample, my understanding of the PPI process was that it was focussed on a relatively small number of articles, so if the reverts were with edit summaries like "revert to previous version, can we discuss this on talk?" they are likely to be more frequent on the PPI articles if only because other editors are currently active on those articles, whereas there are a humongous number of articles which are either not watched by anyone or only be people who just revert obvious vandalism. Apologies if you calculated your control by activity level of articles edited as well as edit count and tenure.

One way possible crosscheck on this would be to check if the reverters used Rollback or an edit summary such as RVV. Another would be if the reverters were previous editors of the article, I'm pretty sure that a revert by a previous editor is much more likely to be an editorial dispute whilst a revert by a patrollers is likely to be because they perceive the edit as vandalism.

Also it might be worth checking what sort of bytes were being added. I wasn't involved in the PPI, but I got the impression that it was about recruiting article writers, you may find that your control sample contains a higher proportion of editors adding templates than your PPI editors. WereSpielChequers 14:23, 25 May 2011 (UTC)

I agree that it could be interesting to look at the type of reverts that are taking place. This could be a good week (maybe sub-week) task for this summer. I also have a way of tracking content in articles (PWR, see "A Jury of Your Peers", WikiSym'09) that might tell us a little bit more about their quality and productivity. Generating the metric is expensive so I intend to work with Diederik to see if we can make it feasible for this dataset. --EpochFail 19:08, 26 May 2011 (UTC)
Great, thanks EpochFail. Do you have any opinion about the idea that the PPI editors might have been more focussed on adding text, or perhaps even a tool that could differentiate the addition of templates from the addition of content? WereSpielChequers 13:52, 27 May 2011 (UTC)
To be honest, the answer is pretty clear: the PPI students were only focused on adding text (which was their assignment) – they simply didn't know anything about templates on Wikipedia. --Frank Schulenburg 00:34, 1 June 2011 (UTC)
I'd rather assumed that was the case, but if so a more valid control group would be a group of Wikipedians with comparable numbers of edits but who don't template articles. WereSpielChequers 14:05, 1 June 2011 (UTC)

Future research[edit]

I just wanted to bring to your attention that the Wikimedia University Program is focused on institutionalizing teaching Wikipedia in the classroom rather than turning the students into permanent community members (as we consider this the more sustainable approach). Asking "Are they becoming permanent community members or are they dropping out?" might give readers of your research the impression that this was what the University Program was aimed at. I guess that only a very small fraction of the PPI students will continue editing – and that's neither surprizing nor something that we would be concerned about. If you are planning to answer the above question, I would highly recommend to mention this point. Please let me know if you have questions about this specific aspect of our work. --Frank Schulenburg 00:46, 1 June 2011 (UTC)

You are going to have much lower community interest if there is not a payoff in either hooked editors or in content. Frank, what makes you think the Wiki should take on an ADDITIONAL burden of suppporting university teaching, with no return? It needs to be tit for tat.71.246.144.154 00:44, 16 November 2011 (UTC)

overcontrolled?[edit]

I worry about the 4 or 5 controls that you make (same number of edits, yada yada). You're moving towards a truism by doing this. For instance if you wanted to know if alligators or bobcats were better at fighting each other, you would not control for weight (as part of what makes alligators better is adult size). Or if you wanted to know whether men were better at the shotput than women, controlling for bench press. Capisce? 71.246.144.154 00:42, 16 November 2011 (UTC)

The idea of adding control variables is to create a level playing field and ruling out alternative explanations. About your bobcat vs alligator battle: actually you would control for weight because not all mature alligators have the same size/ weight nor are all bobcats identical. So if you would do 50 fights then you definitely want to control for weight. If you don't then people will always be able to say the alligators won because they were heavier and there is no way for you to counter that argument because you did *not control* for it.
Well yes, definitely. If you want to separate alligator-bobcat "ness" out from weight, then doing a control for weight is valuable. You can even come up with some equation that gives you percent win likelihood based on species as well as weight. And then if you had BOTH pieces of info, then it would help you wager. However, if you only knew the species...then betting based on the controlled variable would not be helpful, since weight itself varies amongst the populations. (In that case, you would only want the uncontrolled results to guide your wager.) Similarly when comparing groups of editors, some of these metrics may vary amongst the groups and be rather important. So you can't say "PPP is just as good as regular." (which by the way is how people are taking this on other talk pages.) 71.246.144.154 03:29, 16 November 2011 (UTC)

Think about the male/female shotputter example, then. Pec strength is fundamentally different between the groups and is the main proximate cause of the benchpress and shotput performance. Controlling for bench and then evaluating shotput...and saying women are as good at putting the shot would be a false insight. Yes...maybe a man and a woman who bench 200# have the same shot throw...but there are many more men who bench 200 than women who do.

Possibley number of edits is highly self correlated with strength of the editor. If you control for that, then you might be overcontrolling. Note, I'm not even saying you ARE. Just there is a good chance you might be. And with four parameters, there is the whole fitting the elephant and all that.  ;)

Or let's say you wanted to say students at Harvard and at Boise State are equally capable...and you controlled for SAT score. Yeah...maybe the kids with 1500 at Boise do the same either way...but a kid at Harvard is more likely to have that score.

This is a basic concept. Let me go look for some good pages discussing that.

71.246.144.154 03:10, 16 November 2011 (UTC)

Some methodology/understanding questions[edit]

(these are not critisms, just trying to understand).

1. How did you select the "regular editors"?

2. the PPP editors? (is it all of them?)

3. What is the horizontal axis on the 4 charts?

4. (crit, maybe) might be better to display these as some sort of scatter plots instead of these bar charts. We could then see better how the populations compare and it is hard to view the many lines and get any info out of those 4 plots. (What story do they tell? Maybe not a crit and I just don't understand. But what is the "so what" message of those 4 complicated charts?)

71.246.144.154 03:59, 16 November 2011 (UTC)

Please don't take these as attacks[edit]

I know that it is always possible to criticize a methadology. And I think your approach was neat and innovative. Srsly. Just trying to noodle over a few concerns. Just like if we were chatting in a bar or something about the study and what it means.71.246.144.154 04:14, 16 November 2011 (UTC)