Grants talk:IEG/Editor Behaviour Analysis

From Meta, a Wikimedia project coordination wiki

More specifics[edit]

What questions will be answered? What are the methods that you will use to answer them? How do you differentiate your work from stats.wikimedia.org?

I suggest that you pick a few analyses from past research that you thought were very interesting and could be important if updated, then propose update and extend those studies with automated reports. You've already done some work to updated and extend R:The Rise and Decline. I'd suggest looking at the following next:

I see a clear way that I can help you with those two should you choose to pursue them. There are likely many other bits of past work that incorporate a visualization component and could be very interesting, but my pre-coffee brain can't recall at the moment. --Halfak (WMF) (talk) 13:10, 29 September 2015 (UTC)[reply]

Some of the questions I'm currently focusing on[edit]

  • How are the active edit sessions in a month split across the different editor cohorts? How do the new comers compare with the rest of the editors cohorts (In percentage & value terms)?
  • How are the bytes added in a month split similarly?
  • How has the longevity/retention rates changed for editor cohorts over time?
  • How do the above change across the different languages?
  • How do the above change as we go from active to very active editors?
  • And what happens when we start looking at these from the articles perspective? Article longevity, article edit activity etc
  • What are the articles the new comers work on? Are they editing older established articles or newer ones?
  • What are the articles being vandalised & who are the vandals?
  • How does the edit activity across a category look like? Are the articles edited mostly by experienced editors?
  • What are the articles being edited on mobiles & VE, Who are the editors using them?

Some other ideas I'm thinking of - Research:Editor Behaviour Analysis & Graphs/Ideas. I'll collate all of them soon.

Methods[edit]

I'll be using visualization techniques like the ones I've already used for the initial set of graphs to create the visualizations. They will all be interactive & allow the user to filter the visualization with a metric that is relevant to the specific visualization. Together we should be able to find our answers or at least directions we can further investigate.

These Visualizations Vs stats.wikimedia.org[edit]

  • Many of these visualization show the split in activity by editor cohorts. Stats shows only gross number for a month.
  • These visualizations are interactive and allow the user to filter the data, most of them on stats are static.
  • The charts on stats and the proposed visualizations here(some have already been built) are very different and complement each other.

Hi Halfak (WMF), 'Creating destroying and restoring value on Wikipedia' looks interesting. Especially the PWV. Let me do some more reading before I get back to you on this. Are there other metrics/papers you want me to look at reverts, flagged revisions etc?--Jeph paul (talk) 13:40, 4 October 2015 (UTC)[reply]

Usability and usefulness as measures of success[edit]

Note: I've spoken with the proposal author about this off-wiki already, just posting my comments here to make it all official. These graphs are intended to allow all sorts of researchers explore data of editing trends over time. As such, it is important that these tools be easy to use for people with many different backgrounds, levels of expertise, and with levels of experience with wiki-research. The proposal author currently lists "50 active users" as a measure of success. The proposer's previous IEG project, ReplayEdits, also had an adoption-related success measure. To help achieve adoption, the proposer should plan to conduct some sort of evaluation of the tool with a set of users that are similar to the intended audience. I suggest that the proposer build user studies into their project plan, so that he can evaluate his designs and improve upon them based on evidence and feedback from real users. I also suggest that he include a measure of success related to demonstrating that the tool can be used for its intended purpose by the people its designed for--that users can explore data successfully, and interpret it correctly. I've volunteered to advise the proposer on how to design appropriate evaluation measures in order to demonstrate success according to these measures. And I believe that he can demonstrate success--his current prototypes are promising, and he has good track record of building useful and usable software. Jtmorgan (talk) 18:42, 1 October 2015 (UTC)[reply]

Hi Jtmorgan, I have added getting user feedback through the length of the project as one of the activities.Nothing detailed yet though. How do we quantify the 'usefulness of the tool' as a measure of success for its intended users? Do we do a survey after the tools have been built? Check if anyone has cited the tools in their research? Your help is most welcome, thanks. --Jeph paul (talk) 13:53, 4 October 2015 (UTC)[reply]

A survey would be appropriate. You could also run some user studies with researchers, and include a questionnaire at the end of the study, in which you could ask questions like "How does this tool compare to other tools you have used to visualize editor behavior trends?", "Would you recommend this tool to other researchers? Why or why not?". By the way, per your "50+ active users" metric. How do you plan to track the usage of the tool? Cheers, Jtmorgan (talk) 17:41, 6 October 2015 (UTC)[reply]
For the replay edits project I use GA(Google Analytics) to track usage, it is deployed on github. I'll have to figure out something else for toollabs.--Jeph paul (talk) 17:15, 7 October 2015 (UTC)[reply]
Sounds sensible. Make sure to check out the Labs terms of use, esp. as it relates to the tracking usage of labs-hosted tools. Yuvipanda or DAndreescu can probably answer any questions you might have. Cheers, Jtmorgan (talk) 19:24, 7 October 2015 (UTC)[reply]
I really like this idea of better understanding editors and the motivations. The more the veil can be pulled back the better. Geraldshields11 (talk) 21:07, 16 October 2015 (UTC)[reply]

Eligibility confirmed, round 2 2015[edit]

This Individual Engagement Grant proposal is under review!

We've confirmed your proposal is eligible for round 2 2015 review. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period.

The committee's formal review for round 2 2015 begins on 20 October 2015, and grants will be announced in December. See the schedule for more details.

Questions? Contact us.

Marti (WMF) (talk) 02:18, 4 October 2015 (UTC)[reply]

Open source?[edit]

I suspect the answer to this question is "of course", but just for due diligence: do you intend to release the source code for this tool under an open license and post it (with documentation) in an open online repository? If so, this info should be added to the proposal, especially since one of your sustainability claims is "The visualization techniques being explored and developed in these graphs can be reused in other projects by researchers & individual editors." Cheers, Jtmorgan (talk) 17:43, 6 October 2015 (UTC)[reply]

Yes the python scripts to generate the data for the visualizations and the js, html & the css needed to render the visualization will all be available on an open license on github - https://github.com/cosmiclattes/wikigraphs.
There are some auxiliary datasets I'll be generating (Eg: the month of first edit of every editor etc). I use them in intermediate steps in the process of creating the visualizations. These datasets are big, 200+mb especially for 'en'. They will reside on the toollabs server but I haven't yet thought about exposing them for general use.--Jeph paul (talk) 17:03, 7 October 2015 (UTC)[reply]

Aggregated feedback from the committee for Editor Behaviour Analysis[edit]

Scoring criteria (see the rubric for background) Score
1=weak alignment 10=strong alignment
(A) Impact potential
  • Does it fit with Wikimedia's strategic priorities?
  • Does it have potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
7.2
(B) Innovation and learning
  • Does it take an Innovative approach to solving a key problem?
  • Is the potential impact greater than the risks?
  • Can we measure success?
7.2
(C) Ability to execute
  • Can the scope be accomplished in 6 months?
  • How realistic/efficient is the budget?
  • Do the participants have the necessary skills/experience?
7.4
(D) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
  • Does it support diversity?
7.0
Comments from the committee:
  • This could become a key tool for editor retention, a critical issue for the movement. Tools that help us better understand the underlying community dynamics have the potential to drive future impact in this area.
  • Would like to better understand how the community works and see how others will use this in their research. Since the product will update itself I regard it as highly sustainable. I like that it works on basically all projects and languages.
  • We have had VE and mobile for years now, and one of the things this proposal will do is help us understand their impact on editor behavior. This is especially needed as we move to a more mobile-only readership and editorship: according to the graphs, in some languages mobile overtakes desktop page views during the weekend
  • The proposed work will address areas in which information is currently lacking, and is likely to produce significant new learning as a result.
  • Great impact, no risk
  • To measure success I would like to see a survey done or a test audience employed, as was suggested on the talk page. This will also help design the tool in a way that the target audience would indeed best benefit from it.
  • Has a proven track record, and the potential to deliver.
  • The budget is reasonable for the work to be performed. The participants previously carried out a successful project with similar scope.
  • Seems to have support from research committee, who are the experts in this area.
  • There is a significant degree of community engagement and support.
  • Is more targeted to people outside the movement, and may indirectly help diversity when editor retention is better understood, so that projects may be designed on the basis of its findings.
  • I would like to see a better documentation of these charts. For example: there is a lot of data on one screen--some slides on Commons would help to explain them. I have followed the discussions on the research mailing list and understand what the idea is, but would like more background to support the proposal. That said, the idea is definitely sound and worth funding.
  • Would like to see additional criteria to measure success and ensure it meets the needs of the target audience.

Round 2 2015 decision[edit]

Congratulations! Your proposal has been selected for an Individual Engagement Grant.

The committee has recommended this proposal and WMF has approved funding for the full amount of your request, $1,000

Comments regarding this decision:
The Committee supports your work to deepen understanding of how editors contribute to Wikimedia projects. We appreciate your efforts to create automatically updating visualizations that will have ongoing value beyond the life of your grant. We look forward to discussing possibilities for concrete applications of your research going forward.

Next steps:

  1. You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
  2. Review the information for grantees.
  3. Use the new buttons on your original proposal to create your project pages.
  4. Start work on your project!
Questions? Contact us.