Grants talk:IEG/Learning from article revision histories

From Meta, a Wikimedia project coordination wiki

Comments on the proposal[edit]

Feedback[edit]

Nettrom[edit]

I added this discussion section because I'm not really confused, nor do I necessarily disagree with your proposal. Instead I have some feedback that I believe can improve this proposal.

When it comes to your visualization of article edit histories, I recommend reading the Ekstrand & Rield paper listed below. It appears to be very closely related.

1. Ekstrand, Michael D., and John T. Riedl. "rv you’re dumb: Identifying Discarded Work in Wiki Article History." (2009). pdf

This is a great paper to know about. It supports my decision to just compare the hashes of the articles for creating the revision trees, and it's great to see some of the optimization problems like coloring nodes by editor have been solved. My data visualization differs from theirs in two ways: first that by breaking the connection with the tree to the table of revisions, reverts and edit wars are presented more concisely as reciprocal edges (though this technique has it's own problems). Second, the intent is to allow users to view revision trees for individual sections of articles, which may make some of the implementation details of the visualization a little more tractable. -- Evoapps (talk) 08:25, 12 April 2016 (UTC)[reply]

I like your thoughts on how articles develop quality throughout their history, and I find the idea of there not being a limit on quality interesting. The longitudinal perspective on quality is important, it's something I've been wanting to look into. Others have done some work on it though, I suggest reading the Wöhner et al. paper listed below.

2. Wöhner, Thomas, and Ralf Peters. "Assessing the quality of Wikipedia articles with lifecycle based metrics." Proceedings of the 5th International Symposium on Wikis and Open Collaboration. ACM, 2009. pdf

I liked the approach in this paper to understand the life cycle of the articles that are improving, but I wonder how these findings can be used to improve the efficiency of editing going forward. Perhaps articles that deviate too far from the typical life cycle of high quality article edit frequencies could be put on a list somewhere to attract more edits? But in general it would be interesting to see how the conclusions in this paper are mirrored with relative measures of article quality rather than with absolute measures. -- Evoapps (talk) 08:25, 12 April 2016 (UTC)[reply]

You propose putting two versions of the same article in front of users. Isn't there an implicit evaluation of the same sort found in whether the content was reverted or revised? Work for instance by Adler & de Alfaro, or by Aaron Halfaker at the WMF (a couple of them listed below) has looked at reverts as a quality indicator. I see explicit data on the evaluation of individual revisions as a useful extension of that, and one that should allow for some interesting analysis comparing these two.

When I originally drafted the proposal I was unaware of how much work on these questions has already been done with the Wiki Labels and the ORES service. I'm currently revising the proposal to utilize these existing methods rather than trying to reinvent the wheel. Basically the result is an interactive ORES job builder from the perspective of article revisions as revision trees. -- Evoapps (talk) 08:25, 12 April 2016 (UTC)[reply]

The way I interpret the proposal, one of the goals is to optimize the process of improving quality in articles, mentioning "best practices of Wikipedia editors". Some articles fail to develop quality as expected, last year we published a paper at ICWSM that looked at how some articles in Wikipedia are hugely popular but not top quality. If your goal is to optimize the process, should that process not also consider article viewership in order to optimize impact? Otherwise, you might direct effort towards articles that are easy to improve, but where there isn't much of an audience.

3. Adler, B. Thomas, and Luca De Alfaro. "A content-driven reputation system for the Wikipedia." Proceedings of the 16th international conference on World Wide Web. ACM, 2007. pdf

I can see where my goals of identifying the best practices for Wikipedia editors might be construed as pertaining to editor trust, but I think they might be different goals. Editor trust is partially calculated based on the total number of edits, and it's hard to make this a real recommendation for new editors. Instead, my goal is to determine what types of edits are more reliably accepted than other types. For instance, are edits made to a single section more or less reliably accepted than edits made across multiple sections? The answers to these questions will likely need to account for possible differences in editor trust, so knowing where the metrics come from is helpful. -- Evoapps (talk) 08:25, 12 April 2016 (UTC)[reply]

4. Halfaker, A., Kittur, A., & Riedl, J. (2011). Don't Bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work, WikiSym’11. ACM. pdf

I see this important research as the primary motivation for interpreting the responses of multiple ORES edit quality models, such as the goodfaith model and the revert model. -- Evoapps (talk) 08:25, 12 April 2016 (UTC)[reply]

5. Warncke-Wang, M., Ranjan, V., Terveen, L., and Hecht, B. "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities", ICWSM 2015 pdf

This is something I didn't consider before, but after looking at your paper I feel I should try to understand more deeply. What are the implications of viewership versus editorial involvement when it comes to understanding how the articles evolve? I'm wondering if there is a way to incorporate viewership data into the revision tree data visualization. For example, it looks like the revision tree for the Splendid Fairywren article blows up when it gets Featured Article status, although this popularity is likely short lived. If the visualization could incorporate pageviews, then it might make it easier to test which edits are likely to survive the most eyes. -- Evoapps (talk) 08:25, 12 April 2016 (UTC)[reply]

Those are the thoughts I had so far. I probably read this proposal more as a comp. sci. paper reviewer than a Wikipedian, as I'm more of the former than the latter, meaning I probably come across as less positive. So just to put it in writing: this is an interesting proposal and I'll stop by again to re-read it!

Regards, Nettrom (talk) 22:52, 6 April 2016 (UTC)[reply]

Disagreement[edit]

If you read the proposal and disagree with my assessment of the problems facing Wikipedia or are unsure of whether my solution will help address real problems, please let me know your point of view below. I am new to Wikipedia and it would be very helpful to know where I'm assuming too much. -- Evoapps (talk) 20:30, 6 April 2016 (UTC)[reply]

Be..anyone[edit]

If there was ever a more detrimental essay on enwiki than BRD I missed it (incl. an account destroyed in 2006, but that was no BRD issue.) –Be..anyone 💩 19:18, 16 April 2016 (UTC)[reply]

Confusion[edit]

If you read the proposal and were confused by some part, please let me know and I'll work on improving it. -- Evoapps (talk) 20:30, 6 April 2016 (UTC)[reply]

Daniel Mietchen[edit]

I find the sentence "Does the actual history of the article (in green) reflect the highest quality version of the article?" from the figure caption confusing, in part because the methodology behind the graph is not explained, in part because there are several ways to look for mappings from the edit history to article quality (or vice versa) and in part because of the word "reflect" being used to map from (a tree representation of) the entire version history to a single version, which is not indicated in the figure. -- Daniel Mietchen (talk) 22:31, 10 April 2016 (UTC)[reply]

Good point, that was unclear. Fully interpreting the figure requires some context that is hard to capture in the caption. I've made an initial attempt, and you can let me know if it's any better. I'll keep working at it. -- Evoapps (talk) 18:51, 11 April 2016 (UTC)[reply]

After reading through the page, the confusion introduced by the figure caption above was partly remedied by the text, partly increased further. You provide a rationale for using a tree view rather than a linear view of page history, but do not explain how the trees would be constructed (I can imagine that this would not fit into the page, but it would be fine to have the details on the file description page on Commons)

I've hopefully improved the documentation for creating the figures a bit, both on the file description page and in the Create a data visualization app section. The data visualization web app itself should allow for multiple ways of computing the trees; the images in the proposal are mostly intuition pumpts/proof of concept. -- Evoapps (talk) 18:51, 11 April 2016 (UTC)[reply]

Also, SVG would be much preferable to browse such files on MediaWiki than PDF. -- Daniel Mietchen (talk) 22:45, 10 April 2016 (UTC)[reply]

Thanks for pointing this out. I've replaced the PDF images with SVG files. -- Evoapps (talk) 17:06, 11 April 2016 (UTC)[reply]

It does not help to have the same figure used twice (perhaps show a tree for another article in the second spot, or even your proposal page?) and that the text about wikivision is almost identical in the What is your solution? and Create a data visualization app sections. -- Daniel Mietchen (talk) 22:45, 10 April 2016 (UTC)[reply]

I agree, reusing the same article twice isn't very helpful. The point that I'm trying to make is that different articles might have very different revision trees, and that visualizing the different articles is the first step in determining whether or not the differences matter for how likely the article is to be improving. Initially, I chose the Preadolescence page because it had a lot of reversions. Based on your comment I added a Featured Article without nearly as much variance: the article on a California State highway. These are just hand picked examples though, and one of the goals of the project is to be more exhaustive in comparing different revision histories. -- Evoapps (talk) 18:51, 11 April 2016 (UTC)[reply]

Questions from a novice Wikipedian[edit]

Do you think that disagreement among editors is a source of inefficiency in Wikipedia?[edit]

The sort of research and analysis that I'm proposing is only useful to those who think the efficiency of the Wikipedia editing process is worth addressing. One source of inefficiency is disagreement among editors. Edit wars are an extreme example of this, but more generally I believe that reducing disagreement would improve the efficiency of Wikipedia. What are other people's thoughts on the issue? Is disagreement among editors a problem worth addressing? -- Evoapps (talk) 20:30, 6 April 2016 (UTC)[reply]

How should I be calculating my budget?[edit]

If you have any suggestions for calculating a reasonable budget for this project or projects like this in general, please let me know. -- Evoapps (talk) 05:14, 13 April 2016 (UTC)[reply]

April 12 Proposal Deadline: Reminder to change status to 'proposed'[edit]

The deadline for Individual Engagement Grant (IEG) submissions this round is April 12th, 2016. To submit your proposal, you must (1) complete the proposal entirely, filling in all empty fields, and (2) change the status from "draft" to "proposed." As soon as you’re ready, you should begin to invite any communities affected by your project to provide feedback on your proposal talkpage. If you have any questions about finishing up or would like to brainstorm with us about your proposal, we're hosting a few IEG proposal help sessions this month in Google Hangouts:

I'm also happy to set up an individual session.

Warm regards,
--Marti (WMF) (talk) 06:03, 2 April 2016 (UTC)[reply]

Eligibility confirmed[edit]

This Individual Engagement Grant proposal is under review!

We've confirmed your proposal is eligible for review and scoring. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period (through 2 May 2016).

The committee's formal review begins on 3 May 2016, and grants will be announced 17 June 2016. See the round 1 2016 schedule for more details.

Questions? Contact us at iegrants(_AT_)wikimedia · org .

--Marti (WMF) (talk) 05:17, 28 April 2016 (UTC)[reply]

Aggregated feedback from the committee for Learning from article revision histories[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
5.9
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
6.9
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.6
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
5.7
Additional comments from the Committee:
  • I am unclear of this proposal impact to the movement. It assumes that editors are all trying to improve articles or improve Wikipedia, but as a volunteer-run organization, that is not something you can assume of all participants. Many edits to articles are made in order to improve coverage of certain topics, but not to improve the specific article which is edited. Also, I don't think editors would be particularly interested in the outcome of this project.
  • The study seems to be a study on the evolution of articles. Is the cost acceptable to know what we already know?
  • Yes. Analysis of version histories, though not new, has never been done in a way that produces a model per article.
  • Innovative approach, though its resulting utility is unclear
  • It sounds likely it could be achieved.
  • The budget is is quite high with one proposed contractor.
  • There seems to be interest in the project, but I am unsure there are people ready to apply the model, once designed and built. I think I am the best judge of whether my own personal edits will survive, and this knowledge is based on personal experience and expertise in the subject at hand. I would not be interested in looking at a model for that information, and I would not be inclined to believe it if it differed from my own opinion.
  • I do not see community engagement in this proposal.
  • I see some benefits to understand how grow Wikipedia, but I don't see a clear alignment with Wikipedia strategies or how the community could be improved with the final products. I am neutral, because the applicants have some experience, but I couldn't see a long-term future after the grant ends.
  • Perhaps I missed the main goal, but the results seems to me adding nothing new to what is already known.
  • I would strongly recommend using Wikimedia labs servers instead. It would save on time and resources.

-- MJue (WMF) (talk) 17:17, 3 June 2016 (UTC) on behalf of the IEG Committee[reply]

Round 1 2016 decision[edit]

This project has not been selected for an Individual Engagement Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding, but we hope you'll continue to engage in the program. Please drop by the IdeaLab to share and refine future ideas!


Next steps:

  1. Review the feedback provided on your proposal and to ask for any clarifications you need using this talk page.
  2. Visit the IdeaLab to continue developing this idea and share any new ideas you may have.
  3. To reapply with this project in the future, please make updates based on the feedback provided in this round before resubmitting it for review in a new round.
  4. Check the schedule for the next open call to submit proposals - we look forward to helping you apply for a grant in a future round.
Questions? Contact us.