Wiki Education Foundation/Wikipedia Fellows pilot evaluation update

From Meta, a Wikimedia project coordination wiki

In early 2018, the Wiki Education Foundation piloted a new program to train subject-matter experts to edit Wikipedia in a 12-week class. In our evaluation of the pilot, we determined the program had merit and we would pursue additional courses to test different variables. This document serves as an update to the initial evaluation, documenting our additional work and learnings. It was written by Wiki Education staff who worked on the program, primarily Will Kent and Ryan McGrady.

Since the pilot evaluation we have run eleven additional courses: six in the summer and five during the fall of 2018. We identified several variables to experiment with, testing recruitment methods, number of participants, length of the course, topic themes, scheduling, and curriculum. One of our top concerns was understanding how a program like this would scale. We wanted to take steps to grow this program in a way that would ensure the production of high-quality content, efficient use of staff time, and a positive experience for course participants.

Theory of change[edit]

Our Theory of Change is similar to that in our pilot evaluation. Our pilot confirmed that the training infrastructure, knowledge, and institutional connections we have built up through our Student Program could be adapted to support academics as they learn to contribute their expertise to Wikipedia. Subject-matter experts have the education, experience, and understanding to contribute high-quality information about specialized, complex, and important topics. We continue to develop and leverage relationships with academic associations and institutions to attract qualified, interested participants. In expanding the program and working with these partners, we saw an opportunity to try to target particular topic areas that need attention on Wikipedia or have a particularly high impact. Additionally, we view these relationships and this program in general as an opportunity for inclusivity, convening cohorts diverse not just in the content they produce, but also in terms of the identities of the participants. If successful, this round of the Wikipedia Fellows program would result in a diverse range of subject-matter experts contributing important information to public knowledge through Wikipedia. It would also indicate that the program is a repeatable model that can be used to target specific kinds of topics.

Several of our key questions from our pilot we sought to add additional data points to:

  • Can we leverage relationships with associations?
  • How do subject-matter experts specifically impact article quality?
  • Can we retain subject-matter experts; will they remain active?

Other key questions we sought to answer in this phase of the pilot:

  • Can we target particular topics for improvement?
  • Is the model better suited to some topics than others?
  • What effect does group size have on content or overall experience?

Preparation phase[edit]

Midwest Political Science Association's Melissa Heeke and Wiki Education's Jami Mathewson discuss the program.

We again invited the three of our partners which joined us for the pilot program, the National Women's Studies Association (NWSA), the American Sociological Association (ASA), and the Midwest Political Science Association (MPSA). We also included six new associations: the Association for Women in Mathematics (AWM), the Association for Psychological Science (APS), the American Anthropological Association (AAA), the American Chemical Society (ACS), the Linguistic Society of America (LSA), and the Deep Carbon Observatory (DCO). As with our pilot, these associations vary in membership size and discipline, giving us the opportunity to test whether membership size affects the level of interest and applicants for such a program. We believe that the mission of this program aligns with the missions of these academic associations and their members would find merit in and benefit from contributing to Wikipedia.

Similar to our pilot program, we collaborated with these organizations to develop communication strategies and establish requirements to participate in these Wikipedia Fellows courses. The participating partners saw value in training their members to make Wikipedia more accurate, as they understand that Wikipedia is where the public learns about topics related to their discipline. They believed members would communicate valuable knowledge about the discipline to the public, advancing the organizations' missions to educate the world and advance understanding of their respective disciplines.

Following the success of the pilot, we planned to experiment with several more instances of the program throughout 2018, testing multiple variables: cohort size, topic focus, course length, and scheduling. These variables were in part based on our existing relationships and in part affected our coordination with the participating associations.

We learned from our experience with the pilot that there are challenges and significant variations in the ability of associations to offer honoraria or other funding to Wikipedia Fellows, and as such we did not suggest they do so.

Key learnings

  • Academic associations see value in training their members to improve Wikipedia articles related to the discipline.
  • There is a demand for a structured program that teaches academic scholars how to contribute to Wikipedia.
  • Partners appreciated that Wiki Education already had ample experience working with university faculty going into this pilot.
  • Partners continue to agree that 3 hours per week is a reasonable amount of time to expect Fellows to contribute.

Selection phase[edit]

For some courses, we specifically recruited for people who wanted to improve a particular topic area. These participants were part of one of our Women in Science course.

We used an application similar to the one we used for the pilot, although the selection based on those applications differed in several ways:

  • In the pilot, we received nearly 90 applications for a single 9-person cohort. For the subsequent cohorts, we wanted to experiment with a number of variables which involved increasing the number of cohorts, varying the size of each cohort, and targeting particular topics or themes. This meant looking at different aspects of applications as well as varying the screening process to allow for more or fewer participants. In the pilot, for example, we were able to look for applicants who most clearly indicated an interest in making contributions to Wikipedia rather than just learning about Wikipedia. For these cohorts, elements like course size, topic, or scheduling sometimes necessitated prioritizing those elements.
  • We placed greater emphasis in our selection process on research background. This was not just to ensure compatibility with themes, but also to cultivate, to the extent possible, a diverse set of academic backgrounds and thus more diverse contributions.
  • Since we included several new partners in these new courses, we wanted to balance for background and interest. We deliberately mixed professions in some courses, while limiting others to a smaller range of fields.

One change we made to the application was to build scheduling into the application form itself. For the pilot, we accepted people and then tried to find a meeting time. As explained in the pilot evaluation, this proved a cumbersome process. For subsequent groups, we provided some available time/day combinations in the application, and could consider availability when forming cohorts.

Key learnings

  • Changing the focus of a cohort means changing what we look for in an applicant. Adding additional variables may mean sacrificing some desired traits of applicants, like the amount of time available.
  • Building scheduling into the application was significantly more efficient than scheduling after the selection process.
  • For the purpose of improving Wikipedia in a limited amount of time, disciplinary cohesion may better facilitate collaboration and relevance of discussions than interdisciplinarity.


The curriculum remained largely the same. There were many slight tweaks made based on notes from the pilot and earlier cohorts, but the milestones and tasks on the timeline were generally consistent. The additional resources provided and other details varied somewhat based on the themes of the cohorts (e.g. Women in Science cohorts contained more resources about writing biographies).

We tried different lengths for how long our cohorts would last (8-weeks, 12-weeks, and 16-weeks). We expanded and condensed our curriculum based on the time constraints, but while this changed the amount we covered each week, the actual content remained the same.

One new element was having a group of alumni to contact with specific questions. In one course, we invited three past Wikipedia Fellows to discuss the pressures around representing your area of expertise on Wikipedia. Participants were enthusiastic about this conversation and found it easier to contribute after hearing alumni stories. We may invite past participants back when there is interest, but we are not at this stage planning to make such engagement a formal part of the program.

Key learnings

  • The milestones and weekly tasks on the timeline continue to be useful as a structure to keep participants on track.
  • We again took a great deal of notes during and between meetings to inform future curriculum development.
  • For several reasons, the peer review task we carried over from our Student Program assignments has proved a little too cumbersome for the value it provides in several of the cohorts. Peer review is still a valuable tool that, among other things, acts as a touchstone to the kind of academic writing Fellows are more accustomed to, so future cohorts should explore other models of doing so.
  • We found that the cohort with the condensed curriculum produced high-quality content, but with little room for error regarding scheduling (if there were absences or if participants had to delay turning in an assignment, it put a lot of strain on their ability to complete the course). For these reasons, we believe the three month model works best for pacing, flexibility, and being short enough to keep participants engaged.
  • Past Wikipedia Fellows can be a valuable resource, but requesting future participation is not something we want to build into our program structure, and thus past participants would only be invited on an informal, case-by-case basis.


We used the Zoom software as our technical solution for meetings.

Meeting length and format remained the same. We placed an emphasis on trying and testing variables, and some elements of the meetings and their structure varied according to changes in theme, length of cohort, and new association partners. We did alter the technology we used in the program. Whereas we used Slack in the pilot, for six cohorts we used an open-source chat client called Riot. After using Riot we switched back to Slack due to usability and familiarity with the platform.

Rate of attrition did not vary significantly with cohort size. The number of participants who dropped out roughly corresponded to the size of the cohort (our largest cohort, with 50 people, ultimately had about 30 people edit Wikipedia). We chose a general definition for participation, counting participants who did a combination of the following: edited Wikipedia, took trainings, and attended most of our meetings. Our rationale here was to have the largest pool of participants work through our courses in order to test other variables. Without participants there would be no way to test these variables.

Due to the difficulty in scheduling our early cohorts, we built scheduling into the application and considered it among other variables in the selection process.

Key learnings

  • Scheduling works best when sorted out in advance, starting with the application. As we continue to grow the program, we may need to schedule more precisely in advance, rather than providing several possibilities.
  • The quality of discussions in the meetings decreases when the number of attendees drops below 5 people.
  • Meetings were much more productive when Fellows had been actively contributing in the preceding week.
  • Several Fellows falling behind in on-wiki contributions may affect the productivity of other participants.
  • In interdisciplinary cohorts, there is often discussion along disciplinary lines, with, for example, psychologists speaking to other psychologists or sociologists speaking to other sociologists. This is highly productive in building camaraderie and enthusiasm for those involved, but makes it difficult for the group as a whole to "feel like a cohort," as one participant said.
  • Discussions of Fellows' article evaluations and topic exploration were often lively, with anecdotes springboarding us into key policy-related or community-related topics.
  • The curriculum had a logical flow to it which helped conversation grow as the program continued. Participants got to know each other better and became more comfortable sharing their experiences.
  • Screen sharing was effective and helpful, according to many participants, although it doesn't include people who connect to the meeting by phone.
  • Several participants expressed that the meeting recordings were useful and asked if they would continue to have access to them after the end of the course. It is unclear how many people viewed them.
  • In hindsight, having a staff note-taker during the meeting was helpful. It is not realistic to have the meeting facilitator take meticulous notes, as opposed to select bullet points and reflections, but it is useful enough that it's worth determining who will take notes up front.
  • Having two staff members in the meetings, rather than one, is helpful in general. In addition to having additional expertise on-hand, having two people engaged and ready with prompts is useful for stirring conversation when things slow down.
  • It's important to emphasize "be bold" early, and to reassure participants regularly that they cannot "break" Wikipedia.
  • We should have potential articles to work on at the outset, based on participants' interests and/or the theme. These articles are potential projects for participants, but more importantly can serve as basis for early edits/evaluations (they will be topics that clearly need work). This eliminates the time needed to settle on a single article while still learning about Wikipedia and exploring the topic area.
  • Meetings are best scheduled such that the staff facilitator has no more than two each day.

Wiki Education staff roles[edit]

Roles were consistent with the pilot. In each course, a Program Manager facilitated the meetings while a Wikipedia Expert supported the sessions and provided help to participants in Slack and on Wikipedia. We alternated between two Program Managers and three Wikipedia Experts filling those roles. In one cohort, the Program Manager also played the role of Wikipedia Expert for experimentation purposes.


In 2018, we facilitated twelve courses, including the pilot, with 163 individuals participating. These participants added 265,000 words, and edited 572 articles. They created 65 articles, and uploaded 48 images to Wikimedia Commons.

Sample work:

  • Feminist poetry is a new article created by a program participant. Creating a new article on a broad topic like this requires a broad understanding of the subject-matter, which is the kind of thing an expert can provide.
  • Jennifer Doudna's article was simply a chronological account of what she had done, and it was told in relation to the men in her life. A participant was able to rework the article so that it showed the importance of her work — including her role in the discovery of CRISPR — and wrote about her as the main figure in her life's story. (See the blog post.)
  • Bette Korber's biography, which was created by a program participant, successfully captures her achievements and puts them in the proper context. Again, it's easy to write a biography as a series of events that give little sense of the importance of their work. It's harder to put that in context, and show the most important aspects of their professional achievements. It's the kind of thing an expert, who understands the importance and can contextualize it, is better-equipped to do than someone with less breadth and depth of understanding.
  • The membrane curvature article had not been edited since January 2014, and it was written in a dense style that was difficult to understand. While the program participant did not re-write the entire article, their additions were well-written and more accessible to readers.
  • The hometown association article was heavily tagged and poorly organized. It was the kind of article that accretes content over time, but lacks coherence. A program participant was able to put the pieces together and give it the coherence it was missing.
  • The NARA-sponsored courses were able to improve a wide range of content related to women's suffrage. Participants created new articles for people like Mary McHenry Keith, Etta Haynie Maddox, and Caroline Katzenstein. They expanded existing biographies of people like Ida B. Wells and Sara Yorke Stevenson. And they created articles on related topics like the Prison Special.

In our intake and exit surveys, responses demonstrated increased confidence around editing Wikipedia, understanding Wikipedia policies, engaging with the community, and to a lesser extent, teaching with Wikipedia. Of 93 respondents, only one said that they were not interested in editing Wikipedia after taking this course; 73 responded with definitely or likely. However, we have seen a steep drop-off in participants actually editing. While many participants made edits in the month immediately after the course end, no participants appear to be retained as an active editor.

Participants also engaged in other ways:

  • We have had eight volunteer blog posts submitted for our blog, in addition to the nine blog posts written by Fellows in the pilot program, for which a blog post was a requirement for participation.
  • At least three participants have facilitated edit-a-thons at their respective institutions.
  • Twelve participants in the summer and fall courses have gone on to teach with Wikipedia, either concurrently with their course (two) or after the course was done. Seven of these were new instructors, while the other five had taught with Wikipedia before taking part in the professional development program, and did so again after they completed it. One participant who withdrew from a course in the summer went on to teach with Wikipedia in the fall for the first time. In addition to these, four participants in the Pilot have taught with Wikipedia.

Key learnings

Our test in variables resulted in some key learnings:

  • Size: Our largest course invited 50 people to participate and our smallest had 6. Although our largest course ultimately worked, several participants dropped out and we also received feedback that the group was too diverse to have expert-level conversations and to have everyone contribute during our meetings in meaningful ways. There was also a technical limitation with our video software, Zoom. Any group over 25 split the classroom, making it impossible to see everyone at once. This was not ideal for teaching or management. With our smallest course, which had 6 participants, we received feedback that there wasn't a critical mass to build community (if one or two people missed a meeting, it was practically a 1:1 hour-long session). Based on this information we believe that courses with 10-15 participants work best. There are enough people to ensure participation in online meetings and the course is large enough for absences to not affect meetings or group assignments.
  • Course Length: We tested three different lengths of courses (8-week, 12-week, and 16-week). All of the variants yielded a similar number of contributions. Feedback we received from participants pointed to a 12 week timeline working the best. The condensed timeline offered very little room for absences or otherwise falling behind, while the longer timeline offered too much space to forget about the course and have the group stratify in terms of progress. It is important to remember that most of the participants adhere to an academic calendar and schedule, meaning there was a constant demand for time and weeks were not always consistent. 12 weeks was optimal for allowing enough time for new concepts to sink in, have space for absences or scheduling conflicts, enough time to research and write, all without it feeling too long or open. Overall, the different lengths illustrated that this course is adaptable. Different groups will have different needs and it was useful to learn that the course can be expanded and condensed.
  • Scheduling: We started courses in January, June, July, and October. There was no start time that was better than any other. There were, however, times during the program that are important to consider. The most notable was the start of the fall semester (both of the June and July-start courses straddled the fall quarter/semester start time in August and September). There was a significant drop in participation once the semester started, which makes sense. Beyond that, planning around holidays, and other academic milestones (exams, breaks, etc.) is helpful. Regarding scheduling fine-tuning (selecting times and days of the week), offering participants a set of times to choose from worked well. Applicants were able to express which courses they wanted to join and we could balance accordingly, trying to honor their preferences.


Zoe Brigley Thompson and Amy Dye-Reeves, two Wikipedia Fellow participants, joined Program Managers Will Kent and Ryan McGrady for a presentation at Wiki Conference North America 2018.

Facilitating these courses has revealed a series of significant findings for us:

  • Recruiting, teaching, and working with new editors in a course like this is a successful model. It is also scalable, with limits, and customizable for a variety of themes, sizes, and other needs of our partners.
  • If participants are in academia, scheduling with consideration of the academic calendar is important. If participants study the same discipline, it's important to consider any major conferences scheduled during the course, as any attendees will likely be unavailable during the event and busy preparing in the weeks prior.
  • Themes are effective both for providing structure for the course and for targeted improvement of particular topics on Wikipedia.
  • Editing Wikipedia is not for everyone, for better and for worse. Editing requires a specific set of skills, combining technical expertise, access to resources, and ability to adapt to an "encyclopedic style" of writing as well as an unusual, sometimes counter-intuitive collaborative model of writing. These elements are not for all researchers and it's worth remembering that there are additional ways to contribute to Wikipedia including smaller edits, uploading images, and organizing the community through WikiProjects, in-person activities, and conducting reviews of content or new edits.
  • We want to make experience in our program as valuable as possible to participants. We know that many Fellows have included this program on their CV, in their tenure portfolio, or have otherwise made the experience relevant to their academic careers. Many Fellows have gone on to teach with Wikipedia. Many have used what they learned with us to help or teach their peers. A few have also used the experience to develop their own scholarship relating to Wikipedia or otherwise introduced Wikipedia as a potential object of study. Future instances may emphasize one or many of these, and may explore additional measures we can take to facilitate participants use of the course, such as issuing formal certificates.
  • While we had hoped this program would encourage editor retention, our initial efforts have not panned out. In February 2019, we had the following numbers:
    • 1/9 of the original pilot members made an edit in the article namespace, 10 months after completion of the course. The highest number of edits in the month was 1.
    • 4/99 of the summer cohort members made an edit in the article namespace, 5 months after completion of the course. The highest number of edits in the month was 9.
    • 10/75 of the fall cohort members made an edit in the article namespace, 2 months after completion of the course. The highest number of edits in the month was 4.
These numbers suggest that the course trains people how to edit, and when they're inspired, they log on and make an edit. But they do not become super active editors of Wikipedia.

Wiki Scholars & Scientists[edit]

We recruit for the individual payer model by exhibiting at science conferences.

In Fall 2018, we launched our Scholars & Scientists program, a professional development course model built on what we've learned running Wikipedia Fellows. It follows a similar structure and curriculum, but with a new orientation, and a built-in flexibility to operate on a variety of themes and with a variety of partners and outreach strategies. As a professional development course, participants receive a formally issued certificate, and we are exploring additional ways to better integrate the experience into the professional lives of the scholars.

We are exploring two models: one with a set topic or set of association partners, and one with a single dedicated partner. The former is open for anyone to apply, or to anyone within particular fields. For the latter, we will work closely with a particular partner, tailoring aspects of our outreach or curriculum to their needs, as well as drawing on their resources in a more nuanced way. We are excited to explore this more. One reason it is appealing is that it resolves some challenges we've faced in recruitment and scheduling.

Our collaborator for the first Scholars & Scientists course is the National Archives and Records Administration (NARA). In May 2019, NARA is launching an exhibit, Rightfully Hers, commemorating the centennial of the Nineteenth Amendment to the United States Constitution. We will run four courses to train academics and professionals how to improve Wikipedia articles related to women's suffrage before and in tandem with the exhibition. We led the first two courses in Fall 2018, the third in early Spring 2019, and the fourth in late Spring 2019.

The Wikipedia Fellows program was offered at no cost to participants, subsidized by generous funders while we proved that we could immerse experts in Wikipedia and train them how to make substantial contributions. Moving forward, we will seek funding for the courses so we can sustain this valuable program. We will cover the cost of the courses via the following models:

  • Individual payer: Individual participants pay a tuition fee for the 3-month professional development course. Most participants do not pay out of pocket, but draw on professional development funds provided by their employer. (This was the model we followed with the NARA courses, which worked well.)
  • Institutional payer: Institutions will fund a full course, creating the opportunity for their faculty or members to participate in the course without worrying about seeking individual funding or reimbursement.

We are looking forward to experimenting in a number of ways, equipped with the knowledge that we can do so while still scaling and improving our program.

  • Themes: We facilitated courses with the following themes: US Midterm elections, Communicating Science, Women in Science, and General Topics (an interdisciplinary course). Then, with the Wiki Scholars model, we added women's suffrage. Most participants appreciated the structure that a theme brought to the course. Given this feedback, building future courses with specific themes is worth pursuing.
  • Advanced courses: At the end of our courses, we asked participants what else they would have liked to cover in a course or future course. Several desired additional conversations about conceptual aspects of Wikipedia: policy, crowdsourcing, notability, veracity, administration, cultural impact, bias, etc. Several expressed interest in the academic research conducted with Wikipedia, and the possibility of developing original scholarship during such a course. Others requested more conversation around improving articles to meet a higher level of quality, such as applying Good Article criteria. Lastly, there were requests for community-oriented courses (e.g., having a course affiliated with WikiProject Women in Red, that more closely connects participants to active Wikipedians looking to engage a specific issue within the community).
  • New course models: We believe we can adapt our curriculum and have the training competencies to experiment with alternative styles of courses. Whereas the others all involve individuals learning to edit Wikipedia and then making substantial improvements to one or two articles, we would also like to run courses that are more collaborative, that focus on images, that work with Wikidata, or that emphasize scholarship regarding Wikipedia. Though we have limited experience as an organization in running these sorts of courses, our staff has extensive experience with all of them.
  • New partners: Having worked with seven academic associations, branching out to additional associations or institutions would make for an even more diverse set of course offerings with a larger pool of potential participants. We are currently exploring more options and are confident that we will be able to adapt this course model to meet the needs of a diverse set of partners.
  • Embedding courses into a college/university: Several barriers we encountered during this last round of courses revolved around finding participants, scheduling, and timing. We suspect embedding a course more seamlessly into an institution's schedule would mitigate or fully address all of these issues.

As we engage in additional Scholars & Scientists courses in the coming year, we are eager to test our assumptions, create a new revenue stream for our organization, and continue to improve Wikipedia content by empowering subject-matter experts to contribute.