Wikimedia Clinics/008

From Meta, a Wikimedia project coordination wiki

This is a digest (a processed, edited summary) of the online conference call Wikimedia Clinic #008, held on August 10th 2020. It sacrifices fidelity to people's exact words in favor of clarity, brevity, and digestibility.

Except for the introduction and the first topic, which is pre-scheduled, the topics are brought up by volunteers participating in the calls.

The call was attended by 6 members of Wikimedia Foundation staff and 8 volunteers.

Topic 1: Introduction[edit]

quick principles[edit]

  • listen with patience and respect
  • share your experience, but remember others' contexts are very diverse, and may not match yours.
  • be of service to other people on the call

These calls are a Friendly Space.

Purpose of Wikimedia Clinics[edit]

  • provide a channel to ask questions and collect feedback on one's own work and context
  • help direct people to appropriate resources across the Foundation and broader Wikimedia movement

If we can't answer your questions during the call, we (WMF) are committed to finding who can, and connecting you (this may happen after the call)

Examples of things the Clinics are not the place for:

  • complaints about interpersonal behavior - there are appropriate channels for this on-wiki, and there is the Trust and Safety team.
  • content or policy disputes on specific wikis. But it is okay to seek advice on how to better present one's positions.

Topic 2: WMF Research[edit]

Presentation by Leila Zia (User:LZia (WMF))[edit]

Slide 1[edit]

  • There was some discussion in clinic #007 about what we mean by research. Start from definition on enwiki:
  • Our type of research is creative and we follow methodologies that are either already defined or are being defined by us.
  • We sought validation for our work from peer review or via academic conferences.
  • We require the work to be "applied". It needs to be actually useful and be applicable to someone but mainly the Wikimedia movement.
  • While a lot of activities can be called "research" our approach typically requires highly skilled work.
  • The type of problems we are looking at is long term problems, e.g. editor retention, article quality improvement.

Slide 2 and 3[edit]

Slide 4[edit]

  • Work is divided into three areas. Broadly, our current priorities are shaped by the research in Research:2030. Accordingly, we:
    • try to identify knowledge gaps within the wikimedia movement and then try to encourage addressing them.
    • try to idendify how we measure the quality of the body of work that is the Wikimedia projects.
    • try to build the research community.

Slide 5[edit]

  • Addressing Knowledge Gaps -- aim to understand gaps in content, readership, and contributorship. We develop systems to identify and address gaps. For example GapFinder, an early research prototype.

Slide 6[edit]

  • Another example: on the slide is a map of geotagged articles across all Wikipedia languages. Every point on the map represents a geocoordinate from a Wikipedia article in some language. The "warmer" the color gets the more articles there are about that place. Dark areas mean no articles are explicitly (with coordinates specified) covering that point.

Slide 7[edit]

  • We designed a system to identify what articles are missing from a given language. ** We find the articles by comparing one wikipedia against another. What is in X but missing in Y.
    • We then rank the articles via a prediction model, based on page views to identify potential high demand articles.
    • We then build a topical model for a given editor. Each article is then matched to a given editor based on their topical model.

Slide 8[edit]

  • We found that if we personalize article recommendations, article creation is increased in rate by a factor of 3.2
  • The tool is no longer active in production.

Slide 9[edit]

  • Gapfinder: You select the Wikipedia language you want to compare with. Select your topic areas and then identify articles associated with that topic area missing from a particular language.

Slide 10[edit]

  • Question in chat: "Regarding not pushing the *proven successful* experimental tool into something used in production -- where was the decision (outside the Research team) made *not* to do so? Or was there no such decision, and it's just not being done because no one decided at all?"
    • Leila (WMF): The organization wasn't ready to embrace technology not developed at Product and to mature it into meeting all the requirements for production use (e.g. security auditing, performance at high scale). I think the organization has improved on this front, and is now better ready to do so.

Slide 16[edit]

  • Link recommendation: given most wikilinks aren't used often, on the one hand, and the concern about over-linking on the other hand, can we identify what are the "most useful" wikilinks (internal links) to suggest between articles?
  • Based on research done in 2016, we are now in a position to build and implement recommendation models, as part of the Growth team's work, offering newcomers the recommendation to add a link from one article to another. Currently being tested in: Arabic, Czech, English, Korean, and Vietnamese.
  • Not in production but we are exploring how that can happen

Slides 19–22[edit]

Improving Knowledge Integrity

  • We aim to assure the integrity of knowledge on Wikimedia projects. Detecting policy violations and gaps. Research about "Citation needed"
  • The "citation needed" template (and its equivalents in other languages) is added to an article to tag that a statement needs attention from the reader to note that a statement is not backed up by citations and is an indicator to editors for how an article can be improved. It is a manual process that is resource intensive.
  • Can we use machines to identify which statements need citations, and even identify *why* they may need citations?
  • This is an assistive technology. Not one that automatically adds citations but aids editors in identifying and triaging citation work.
  • [Prototype here]
  • Currently has models for English, French, and Italian Wikipedia. We worked with the developer of CitationHunt
  • It can assist in guiding newcomers in how they can contribute in a constructive manner.

Slides 23–26[edit]

  • Research showcases, office hours for helping out people who have questions about our work.
  • Annual event: Wiki Workshop (usually collocated with a major computer science conference).
  • We have research internships in our team.
  • Formal collaborations with researchers in academe or the industry, primarily pursued in a volunteer capacity.
  • We serve the research community in functions such as reviewers or program committee members, helping to improve the quality of research.

Getting involved[edit]

We are interested in supporting you. And to prioritize topics that matter to you. You are welcome to show up at our office hours and talk to us. You don't need to be an academic or practicing researcher to bring up questions or requests!


  • Q: Office hours page doesn't list any future sessions?
    • Leila: Yes, going through an evaluation but this will be fixed shortly. We're changing how we structure them but will try the video setup (partly inspired by clinics; partially wanting to increase interactions)
  • Q: Does the Research team review all "research" that happens at WMF? E.g. in a team like Community Development?
    • Leila: No, we do not. But if the research is mentioned in the annual plan, we would notice it and make sure we are able to support it, capacity-wise. We are also available to other teams to consult with. But we do not proactively oversee all work called "research" in other teams.
  • Q: how is "article quality" measured, for the recommender system?
  • Q: how can non-academics best contribute to WMF Research?
    • Leila: we need people wiki-knowledge at all steps of the way, from the very beginning in determining what are useful problems and research questions to pursue. We get a lot of that at in-person events. These days, I encourage everyone to tell us in any channel.
  • Q: can interested communities re-use Research team code even if WMF isn't (yet) "productizing" it? Is the code shared? Does it rely on internal data? etc.
    • Leila: the source code is always available (with very few exceptions with privacy risks, e.g. Sockpuppet detection). Sometimes the code is readily re-usable, with the public data, but often the computation resources challenge is still beyond the typical resources available to community volunteers. We are happy to assist interested communities in making use of our research code.
  • Q: In the link recommendation tool, will you investigate if the "detour" in the suggested links is needed for understanding, or even the need to want to read a target page? Example, you are reading about long-lived mammals, go to elephant, and from there to the curious article Execution by elephant
    • Leila: at the moment we are not able to understand what is required to know in order to get to the next "hop" in a link-chain. But given an observed trajectory of links, we can check if the origin article is mentioned in destination article, as an indicator that a direct link may be useful.
  • Q: I'm coordinating a monographic (call for papers) about Wikipedia, in the communication area. Articles in Spanish, English and Portuguese are accepted. Where can I send the call for papers so that researchers in our community can see it?

Topic 3: Wiki Studies[edit]

volunteer: some research has wikis at its center, to pursue a particular research interest, but there are no (AFAIK) "wiki studies" professors in the world. Perhaps it could be valuable to develop a "wiki studies" research field, with some shared definitions ("what is a wiki?"), practices, channels. I am writing a book about wikis, and am using a lot of surveys and reviews from many disciplines, because there's no easy central tracking of "wiki studies".


  • Seddon (WMF): This seems close.
  • Asaf (WMF): OpenSym too.
  • Leila: I need to think about whether investing in this would be a good use of our time. Peer production network research is close to "wiki studies".
  • volunteer: networks are indeed starting to establish themselves. It's good, but I'm interested in a humanities angle.

Topic 4: Pitching ideas for development to WMF[edit]

Volunteer: How would one theoretically propose to the WMF that they build a particular tool/software project?


  • Leila: It is indeed difficult, even for us, within WMF. Adopting new technologies or dependencies is very unlikely, so if your suggestion involves that, it's very unlikely to be seriously considered. The Community Tech Wishlist is a good vehicle.
  • Seddon: Though the scope of the wishlist is sometimes too narrow for some of the ideas proposed.
  • Asaf: Internally advocating for certain type of work to happen is not easy. Here is what does tend to work, in my observation:
    • One flow is work invented at the WMF. WMF tends to resource this kind of work. An example of this is the current Growth team which has precedents all the way to 2012 (The former Editor engagement team, etc.). This happens during the annual planning phase. Once such work is budgeted, it becomes fairly inflexible. The teams are not generally open to "an idea came up! Let's tackle it!".
    • The one mechanism that was designed to do that is "the community wishlist", mentioned above. It's not spontaneous (it takes place one time in the year). But that's an effective way for communities to introduce ideas and get WMF to allocate resources to them.
    • The third thing that works is tool building: prototyping on Toolforge. If you have a very useful tool, you can get dedicated resources there as well (like PetScan, which gets its own virtual machine, because it's so useful and widely-used). The tools there cannot change the core working on the wikis. Gadgets and user scripts are a way to create (mostly user-interface) change on-wiki. But some gadgets and scripts are quite powerful, e.g. VisualFileChange on Commons, or the Wikidata-Editing Framework.
    • Finally, if you really want to influence what WMF allocates tech resources to, you need to influence the WMF Product managers. They are generally easy to reach via or Phabricator, and open to hear thoughts; if you can convince them that your idea serves their overall mandate (e.g. "Growth"), they may include resources for that idea in the next annual plan they contribute to.
  • Seddon: good ideas are plentiful. Not all of them are actionable at this point in time, some of them depend on great community buy-in, etc.

Specific example[edit]

  • This is the idea in question on this volunteer's mind.
  • Leila: the working group on diversity and inclusion may be very interested in that.
  • Asaf: And be clear about what you propose. For example, we'd obviously not suggest to run this on all Wikipedias all the time (i.e. anonymize all users). It can provide an opportunity for times when we need to do some sort of controlled study on past decisions (for example, on deletion discussions, to assess bias). You can prepare a write-up making the case for the tool's usefulness.
  • volunteer: on my [mid-sized --AB] Wikipedia, it wouldn't work well, because there are many users whose style is distinctive enough that I can identify them even without the signature.
    • Asaf (WMF): indeed, below a certain community size, it will have diminishing returns, because of intuitive stylometry. :)

Topic 5: Harassment[edit]

  • volunteer: someone proposed deletion of an article about a woman. Then the proposer was harassed for proposing deletion. Does that count as harassment?
    • Asaf: Absolutely, it can count as harassment. No one is exempt from expected behavior standards, whatever group they belong to or whatever their editing goals. But let us also acknowledge that since harassment is unacceptable (and punishable) on our projects, and will become even less acceptable with the upcoming Universal Code of Conduct, there is always the danger of people "weaponizing" the term "harassment" and claiming to be victims of harassment when they are not, or even when they are in fact the aggressors. This is why we cannot always look at just a single on-wiki action: as you say, merely proposing an article for deletion is an everyday, normal part of wiki life, and generally legitimate. If someone proposed for deletion an article written by a user they had never interacted with, for instance, that's probably fine; but if someone is systematically proposing all articles by a certain user for deletion (even if most of those end up not being deleted), the proposing itself can be a form of harassment! Context does matter.

Feedback on the call[edit]

  • I'm grateful for the opportunity. I feel isolated during the pandemic. It's great to see faces.
  • thanks, I enjoyed it.
  • works pretty well, both technically and otherwise