Research talk:Reader crowdsourced quality evaluations

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Following on from the initial idea:

  • We need to define the type of traffic that would be amenable to this sort of implementation.
    • Slow edits, frequent reads, for example
  • Which edits to present for review:
    • Edits on which the bot score is indeterminate.
    • Edits by new and low-karma contributors.
  • How to review:
    • More readership requires more review.
    • Minimum two reviewers must agree that a change is beneficial, more for high-readership articles.
    • Always more than half the reviewers must agree it's good.
    • Reviewers with good karma count extra.
  • Keep a karma score for editors.
    • If their edit is accepted, the score improves.
    • If it is reverted, it goes down.
    • Karma also from agreeing with other reviewers
    • karma≈(Accepts - rejects)/edits + (agrees - disagrees)/(reviews*25), probably weighted to more recent contributions
    • Low karma (especially in proportion to contributions) could slow down the contributions process. "Sorry, you've already edited your fill in the article namespace for today. Come back tomorrow or get some people to review your contributions."

I don't believe the implementation of karma is strictly necessary to the implementation of the crowdsourced review, but it would improve the data at our disposal and give more power to the masses to do some simple work.

This tool is not meant to be effective at, ahem, policing community standards, but rather at limiting the amount of cruft that stays in the encyclopedia. We have so many rules now that it's hard to keep up, but if those who know the rules can focus their attention on improving articles and working with constructive editors instead of combating nincompoops, perhaps they'll have a bit more patience and run fewer off. -- Ke4roh (talk) 14:46, 1 May 2015 (UTC)

How is this different from the Wikidata Game, WikiGrok or Article feedback and what do you plan to learn from those experiences? --Nemo 14:50, 1 May 2015 (UTC) P.s. I question this usage of "crowdsourcing": open wikis invented crowdsourcing. Also, there is no such thing as a reader crowd.
Nemo This project would be different than Article Feedback Tool in that we wouldn't be soliciting feedback on whole articles but rather individual or small sets of changes. This would be different than the the Wikidata Game in that the focus would be on quality evaluations of other contributions -- not direct contribution. It seems that the WikiData game is aimed at adding information, not evaluating contributions. Also, FWIW, wikis did not invent crowdsourcing (read you some history :P), but that's not really relevant here. Also, I don't see how that essay refutes the presence of a very large group of people who read but do not yet contribute to Wikimedia project, so I'm not sure why you would like us to read it. --EpochFail (talk) 15:11, 1 May 2015 (UTC)
Right. The idea is to present very small units of work to people who might be willing to do small units of work. Asking for article feedback helps us know which articles might need attention, but this is looking at individual changes - a small enough unit of work to ask someone to quickly and without much trouble, evaluate it. The fundamental problem we need to solve is the attrition of the editor pool. This would provide a gateway into the editor pool that gives prospective editors:
  • an opportunity to engage with a minimum of commitment
  • experience reviewing small changes
  • an idea of what the community likes
  • a more understandable hurdle for an accepted first change
By doing these things, there is a more gradual shift from "reader" to "editor" than just the reader who dares to click the "edit" button and (accidentally) break all the rules on day one. It attempts to automate a part of the editor recruiting process. Make sure people understand who writes for Wikipedia and how they can be among the writers, and filter out the cruft. The reasons for collecting data (and especially karma) are to raise the bar of edits that editors have to take into consideration and the editors coming into the pool. Karma could map onto other things editors do, too. It could be integrated with our page protection system so that one needs a certain karma to edit pages with certain protection. It could also be integrated with traffic stats so that you need enough karma to edit a page with so much traffic. By offering a formal process to graduate to editor, we make it a little less likely that the newcomer will draw the ire of the cranky editor, and therefore more likely we'll retain the new editor. -- Ke4roh (talk) 17:05, 1 May 2015 (UTC)
The PDF suggested on enwiki looks interesting, sadly no CC BY or similar in sight for a copy on commons. I'm quite sure that Wikimedia won't see 2020 exactly for the reasons discussed in the enwiki VP thread and the paper, therefore any attempt to prove that I'm wrong is by definition a good idea.:tongue:Be..anyone (talk) 10:11, 3 May 2015 (UTC)
Regretfully, we (ping User:Staeiou and User:Jtmorgan) had to follow the journal's licensing to get it published. I wrote a CC-BY-SA summary from scratch though. See Research:The Rise and Decline. Re. not seeing 2020, I disagree. I think that the English Wikipedia and other mature communities are feeling the pain of maturity and social/ideological stagnation, but that other, younger projects will have a while yet. E.g. WikiData. In the meantime, I'd like to try to figure out how mature open communities can recover the adaptive capacity they have earlier in their lifecycle and at smaller scales. Adaptive capacity is one of the things that Research:Revision scoring as a service is intended to affect. I think that improvements in infrastructure tend to have this effect generally. Even if we're too late to "save" the large, mature Wikipedias, it would be good to have a better understanding of these issues when WikiData or the next big project matures. --EpochFail (talk) 15:37, 3 May 2015 (UTC)
Another excellent article, EpochFail. Revision scoring as a service will doubtless provide good input to this reader-crowdsourced quality evaluation process - to help determine which edits should be passed to crowdsourcing. Now how can we turn this idea into real research and then into an actionable change, ultimately on en Wikipedia? I suggest that we might be able to compare the history of contributor pools of other communities that use similar systems. Compare Reddit and Stack Overflow, for example. Others? Those two should have enough history. Is there enough data to be had? Here's a first draft hypothesis: Crowdsourced reviewing and gradual permissions based on the quality of past contributions fosters retention of new contributors. Contrast with Wikipedia's model. -- Ke4roh (talk) 13:43, 5 May 2015 (UTC)


Thinking some more on this, it makes good sense that editors would be gated based on karma. The lowest editors can only edit the low traffic pages. And the higher the karma, the higher traffic pages they get to edit. And as traffic increases, so does the (cumulative) karma of the reviewers required to accept a change.

accept=((Σapprovers' karma - Σrejecters' karma)) > (fKarmaThreshold(page views))

The simplest fKarmaThreshold might be to multiply page views by some karma factor. It might also make sense to have an exponential scale to require increasing karma to edit the most popular pages. This is quite similar to the system of protection we have now, but by making it more of a community effort than what, to the outsider, might appear to be a cabal of editors, it becomes more approachable and removes the burden of policing petty changes from the editors. A phased approach for rolling this out:

  1. Develop some reviewing code and test it on many low traffic pages
  2. Test the reviewing code on a few higher traffic pages
  3. Roll out the reviewing code for all articles
  4. Track contributor karma and use it to influence which changes are flagged for review
  5. Use karma to gate which articles users may edit

At each step, we measure the impact of our changes and develop metrics to inform implementation of the next phase (or even decide it sucks and throw it all out). Ke4roh (talk) 16:44, 5 May 2015 (UTC)