Jump to content

Research:Understanding newcomer mentorship on Wikipedia

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T397550
Created
20:51, 9 December 2025 (UTC)
Duration:  2025-09 – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


New editors on Wikipedia have a lot of questions. Despite the steady improvement of tooling and support for new editors, editing can still require navigating challenging questions of how to navigate different policies, unfamiliar syntax, various workflows for specialized edits like uploading new images or creating new articles, and a platform that is often quite different from other ways in which folks contribute content online. There are also a lot of Wikipedians who are highly motivated to help out these newcomers as they find their place on the wikis, but they also face a lot of challenges in keeping up with the needs of the many newcomers. This project involves three stages:

  • Understanding the current state of mentorship on Wikipedia (starting with English) -- where does it happen, what sorts of questions are being asked, what issues/opportunities arise with the current systems?
  • Measuring the extent of mentorship -- mentorship is often a very diffuse and hard to quantify process but we can start to bring some data to understanding various aspects of the process that can help with evaluating any future interventions.
  • Addressing the issues that prevent effective mentorship -- developing prototypes that can perhaps help make mentorship easier and more rewarding for both parties.

Core mentorship spaces

[edit]

There are a few structured ways in which mentorship/support happens on English Wikipedia. Anecdotally a lot happens also through talk page discussions or in off-wiki (and sometimes offline) spaces. These space range from very mentorship-oriented (learning etc.) to more support-oriented (getting an edit done), but all have some component of (newer) editors seeking help from more experienced editors:

  • Newcomer Homepage: new editors are now automatically assigned a mentor (more experienced editors can opt-in to being a mentor) with various entrypoints for posting a question on their mentor's talk page and tracking answers etc. It's 1:1 mentorship (though on rare occasion other editors will jump in to answer questions as they are publicly posted to talk pages).
  • Wikipedia Teahouse: Teahouse is a space for asking questions by newcomers with a number of experienced editors watching the page and answering questions pretty quickly as they come in. New editors might get a nudge to ask questions on Teahouse via various talk page templates (examples).
  • Adopt-a-user: program for somewhat experienced editors to get 1:1 mentorship from more experienced editors.
  • Help me templates: templates that editors can put on their user talk page (or occasionally article talk page) to request assistance. Various editors watch the corresponding category and step in to assist when new questions pop up.
  • Help Desks: generally more technical or domain-specific spaces like Help Desk, AfC Help Desk, Reference desk.
  • Portals: places with lots of pre-existing help/guidance documentation and links like WP:Questions, WP:FAQ, Help:Menu, WP:Request Directory, Help:Getting Started.


Methods

[edit]

Given the variety of goals for this work, it also employs a wide variety of methods:

  • Understanding: qualitative coding of newcomer trajectories before and after they ask questions. My initial focus is mainly on Newcomer Homepage questions given that that program is available to all new editors and the structure was built by WMF and so there are more opportunities for intervention. I'm also exploring community-built spaces like Help me template and Wikipedia Teahouse questions.
  • Measuring: gathering counts of questions asked via Newcomer Homepage, Help me templates, and Wikipedia Teahouse. Based on the results of the qualitative coding, I may extend the measurement to include other facets such as response time etc.
  • Addressing: prototyping improved natural-language search of the various spaces in which newcomers might find answers to their questions -- e.g., Policy documentation, Help pages, and past question-and-answers.

Results

[edit]

Understanding

[edit]

Some observations so far from the qualitative coding and informal discussions with experienced editors:

  • Mentorship can serve two distinct goals: the technical aspect of how to complete an action and the more norms-related component of learning how contribution and collaboration works on Wikipedia.
  • Many editors are not "surviving" after they ask their question -- i.e. they do not make any further edits. A number do continue to edit but they rarely follow up on their initial question unless the initial response came on the order of several minutes (though sometimes they take actions based upon the answer).
  • Newcomers very rarely thank their mentors (either via a reply or the actual thanks mechanism). The mentors definitely deserve thanks though -- they display incredible patience/kindness.
  • The questions span everything from very specific help about a source to basic "how do I get started?" to help navigating the various on-wiki processes to really anything (not always even wiki-related). Despite that diversity, there are a lot of duplicate questions. Many of the questions require clarification as well and this can be a large barrier to the newcomer eventually getting their answer (and eat up mentor time).
  • In 1:1 mentorship spaces, responses often take upwards of 12 hours or more, which is frankly quite quick given that editors are volunteers and may be in completely different timezones. Responses are much faster in 1:many spaces like Teahouse (often just a few minutes). But this is still far slower than chat etc. and newcomers would not have any indication that someone is typing out a response.
  • The Mentor Dashboard makes it easier for mentors to follow what their mentees are working on, but it's still a lot of work to keep on top of everything and mentorship is not necessarily recognized well on the wikis.

Measuring

[edit]

Some miscellaneous data so far:

  • Newcomer Mentorship Module questions can be identified via (mentorship module question and mentorship panel question in Special:Tags) or by matching against the corresponding section header style.
  • Of the ~16,000 questions asked via (data) and ~1700 of those were from editors whose account was <10 days old (a later adjustment expanded this dataset a little to the ~19,000 mentioned in the table below).
  • Looking at common questions via k-means clustering of the embeddings for each question-and-answer section (1000 clusters on 254,137 sections), I would describe the top clusters as follows (based on random sampling of 10 sections from each cluster):
    • mostly off-topic questions that are redirected to the Reference desk (n=722)
    • people saying hi! (n=697)
    • mostly just "what is your question"? type of questions or things that get redirected to Reference Desk etc. (n=677)
    • the people want User Boxes! (n=621)
    • people reporting vandalism (n=603)
    • editors leaving Wikipedia and asking how to delete accounts (n=593)
    • largely empty usages of Help me template (n=588)
    • largely nonsense questions/statements (n=539)
    • asking for help to resolve reference errors they caused, often surfaced by Qwerfjkl (bot) 17 which includes an easy way to ask for help (n=533)
    • how to add an infobox (but usually they don't know what it's called) (n=521)
    • how do I handle this reverting of mine or other's content? (n=500)
    • challenges with uploading images (n=499)
    • issues with adding (external) links (n=489)
    • a bit more of a hodgepodge but mostly please review, or fix, or help me find an edit I lost (n=475)
  • A follow-up clustering of just the questions from 2024 and 2025 revealed an anecdotally much higher rate of questions about article creation. A follow-up coding of 100 random questions from 2024 and 2025 revealed the following:
    • 35 of the 100 questions were about article creation. Of these 35, 17 were about "Getting Started" e.g., "how do I write my first article?"), 14 were about "Approval/Review" (e.g., "I created a draft. How do I get it reviewed?"), 2 were about "Notability" (e.g., "I have X sources. Is that okay?"), 1 was a user account naming issue preventing them from creating an article, and 1 was a request for editing help (e.g., "How do I fix this reference in my draft?").
Count of number of questions asked per source and year. See task T397550#11469797 for a nice chart
year help-desk teahouse mentor help-me total-questions
2004 57 0 0 0 57
2005 2806 0 0 0 2806
2006 7286 0 0 3 7289
2007 10885 1 0 3 10889
2008 7217 0 0 4 7221
2009 6873 0 0 18 6891
2010 6338 0 0 504 6842
2011 6506 1 0 965 7472
2012 6433 2063 0 902 9398
2013 5857 3514 0 1036 10407
2014 4716 3870 0 1259 9845
2015 4104 4356 1 1772 10233
2016 3931 3888 0 1903 9722
2017 3678 4713 0 1790 10181
2018 3633 5484 0 1491 10608
2019 3836 6073 1 1366 11276
2020 4642 8391 0 1598 14631
2021 4241 9231 880 1177 15529
2022 3351 6734 2742 748 13575
2023 3788 6274 4849 574 15485
2024 3560 6316 9670 531 20077
2025 2436 4581 14267 1108 22392
TOTAL 106174 75490 32410 18752 232826

Addressing

[edit]

I built a prototype for natural-language search that can be tested out on Toolforge. No evaluation has been done yet though the UI will show you how it compares with existing on-wiki Search options. The prototype is based on embeddings from Qwen3-Embedding-0.6B (code). I also compiled a dataset of common questions based the top clusters from a k-means clustering of all of the questions in 2024 and 2025. The intent was to use this for direct evaluation of the search prototype but the potentially-relevant answer space is too broad -- i.e. there are likely many appropriate responses to a given question -- so I think manual annotation of results for relevance would still be required to get a good evaluation.

See also

[edit]

References

[edit]