Research:Voice and exit in a voluntary work environment/Elicit new editor interests

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Bob West
Ramtin Yazdanian
Duration:  2017- — ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.

Upon signing up, new Wikipedia editors-to-be are confronted with a vast wealth of articles, not knowing what articles and topics exist and in addition, not knowing what articles are in need of contributions. In the framework of a recommendation system for Wikipedia editors, this problem is called the user cold-start problem. One of the primary motivations for this project is the fact that the gender and ethnicity distribution of active Wikipedia users is currently very biased, and having a system for holding the hands of newcomers can help alleviate this problem by encouraging more of them to stay and contribute. This can be achieved through recommendations of articles and more importantly, of users - pairing newcomers with veterans.

Currently, no such system exists in Wikipedia, and users who want to get past the initial barrier will have to get past the cold-start phase by a combination of getting to know other users and sub-communities and also getting to know different topics, which will have to be done without systematic guidance. Therefore, it requires determination and dedication that might not be present in every user. We believe that a bit of systematic hand-holding can go a long way in encouraging users to stay within the system.

Our work aims to create a questionnaire that captures topical dichotomies based on article content and user preferences. The questionnaire will be presented to users upon signing up, and their answers will act as an initial profile that will allow us to recommend articles and their prominent editors to the newcomer. This will allow them to pair up with the said editors, allowing them to get into editing quickly.

Adapting recommender systems to Wikipedia[edit]

Creating a recommendation system for Wikipedia poses two challenges in particular:

  1. Wikipedia editors do not "rate" articles; the only information available about their preferences is their editing history.
  2. As opposed to most systems which recommend "consumable" items such as movies to watch, recommending Wikipedia articles to editors would not be for their consumption, but for their contribution; and while every movie can be watched, not every Wikipedia article needs contributions (and additionally, not every user will be qualified to contribute to a given article).

The former point prevents us from using most of the existing questionnaire methods, which are built for explicit-feedback systems such as Netflix. The data we have available on users only includes their editing history, based on the revision history data. This data lets us know who has edited which article with what frequency (and it provides additional information such as the size of the article after the edit and whether it was flagged as a minor edit or not), and therefore if a user has a history of repeatedly editing an article, we know that they are interested in that article. The problem of implicit feedback arises when we have few or no interactions between a user and an article; if the interactions exist, but are few in number, we cannot reliably say that the user dislikes the article, but we cannot be sure that they like it either. On the other hand, non-existent interactions are treated as unknown in explicit feedback systems, but if treated as unknown in an implicit feedback system, the system will basically only have like and unknown, instead of like, dislike and unknown. This has prompted existing literature on implicit feedback systems [1] to treat unknown data as dislikes, while incorporating a confidence value that increases with the number of interactions, therefore putting a higher emphasis on known and frequent interactions rather than infrequent ones.

The latter problem can be solved by pairing newcomers with veterans who have significant experience in Wikipedia and expertise in the topic in question. This way, experienced users who know which articles require edits can point the newcomers towards them, in addition to assessing their qualifications.

Method summary[edit]

In this section we will discuss our questionnaire generation method in brief, and we will elaborate on our recommendation and evaluation methods. A flowchart of the entire pipeline is displayed on the right.

A complete flowchart of the questionnaire generation method, including evaluation.


The data we use consists of two parts:

  1. The content of Wikipedia articles.
  2. The revision history of Wikipedia articles.

The former is used to generate a bag of words representation of each article, which allows us to represent each article as a vector whose dimensions correspond to terms in the vocabulary and the value of the each vector element is the count of the corresponding term in the article. This will be the content-based element in our recommendation system.

The revision history data [2] informs us of the editing history of all users. Out of the raw revision data, we extract a user by article 'editing matrix', whose each element is the number of times that user has edited that article. This is the collaborative element of our recommendation system, and will allow us to capture user preferences and detect groups of users with similar interests.

Throughout this document, the words 'article' and 'document' will both refer to Wikipedia articles, and may be used exchangeably.

Extracting the topics[edit]

Our basic idea is founded on Latent Semantic Analysis (LSA), a method that extracts topics from a set of documents using Singular Value Decomposition (SVD) on their bag of words (or TF-IDF) representation. Since we want to accommodate both the content of the articles and user preferences in our system, we will use a joint topic extraction method that utilises both of these data sources. Our method is a matrix factorisation method that attempts to factorise the editing matrix as a multiplication of a user latent matrix by a document latent matrix, while attempting to keep the document latent matrix close to the result of LSA on the content TF-IDF matrix. Our desired output from this algorithm is a document latent matrix, whose each row is the representation of the corresponding article in our latent space. We consider each dimension of the latent space to be a 'topic', as determined by the distribution of words in documents and the editing behaviour of users regarding different documents.

In our tests, we will primarily consider two different systems: one using the original LSA on the document content matrix, and another using the joint method. In our offline tests we have also used a random recommender baseline.

Generating questions[edit]

To generate questions, we take each dimension of the latent space (i.e. each topic), and look at the top 20 and bottom 20 articles. The top 20 are the articles with the highest positive weights in that dimension, and the bottom 20 are articles with the highest (in terms of absolute value) negative weights. Our idea is that since we cannot show the user the entire topic (which consists of hundreds of thousands of articles and their weights) for them to decide which direction in this topic they lean towards, we need to capture the main dichotomy involved in said topic, and these two sets are our solution to this problem. The question generated from this topic takes the form of "Which of these two sets of documents would you be more interested in editing?", with a 'neither' answer also being possible. Then, based on their answer, we can say that when it comes to this topic, the user is generally leaning towards the positive direction, the negative direction, or neither. The user's answers to these questions (which will be presented to the user as a list) will function as an initial profile for the user, based on which we will proceed with our recommendations as described in the next section.


The profiles we have created for the users give us multiple options:

  • Recommending articles: Based on their question-space profile, we can calculate their latent profile as the average of the latent representations of articles that they have chosen in their answers to the questions. Afterwards, for this latent space profile, we can perform a k-nearest neighbours search among articles in order to find the k closest articles. However, there are two practical considerations that make this a less-than-optimal choice:
  1. The user in question does not necessarily have the qualifications to edit the articles we recommend.
  2. Most of the closest articles will be highly popular ones, which are most probably quite complete and in no need for contributions.

The two aforementioned reasons mean that recommending articles will detract from our goal of increasing editor diversity and empowering them.

  • Recommending existing users: Since we also have latent representations for users, we could match the newcomer with experienced users whose latent space profiles are close to said user. However, the main issue here is that the latent profiles are time-agnostic: a user who has edited some type of article in the past might not necessarily still be interested. In addition, matching one newcomer with one veteran might be quite intimidating for the newcomer, and could potentially reinforce an existing hierarchical editor structure, which goes against our goal of increasing diversity. Finally, the latent representation from the newcomer comes from a different matrix than the one for the veteran, and this could potentially negatively affect the quality of recommendations.
  • Matching newcomers together: Since we have question-space profiles for each newcomer, we can match them based on their answers. Therefore, in the deployment version of this system, several newcomers with similar interests would be matched together, along with one more experienced user to show them the ropes (who could, in order to volunteer for the task, also take the questionnaire). We believe that this would go much further towards the goal of newcomer empowerment than the other two options, and therefore this is the option we have ultimately chosen. However, for the sake of simplicity of evaluation, for our online experiments we have chosen to only match pairs of users together. We will describe our matching scheme in the next section.


We use two types of evaluation: offline and online. The online evaluation will be discussed in a separate section, since it constitutes a considerable portion of our work, while the offline evaluation is part of the question-generation approach.

Offline evaluation[edit]

Averaged similarity scores for random matching (yellow) and question-based matching (blue) for the offline simulated experiment. Averages over all pairs are the dashed vertical lines (red for random and black for question-based).

Our offline evaluation consists of two parts:

  1. Assessing the intrinsic quality of the questions.
  2. Simulating the to-be-online experiment on the profiles of a set of held-out users, and comparing the results of profile-based matching with the results of random matching to assert that our questions are capturing useful information about the users and that this information is improving the matches.

In order to assess the intrinsic quality of each question, we calculate a cohesion metric for each question that assesses the similarity of the top 20 documents and also the similarity of the bottom 20 documents, and sum these two values (which are between 0 and 1 each) to get a cohesion score from 0 to 2.

In order to simulate the online experiment, we take a held-out set of 8500 users, hide 20 of their edits, and simulate their questionnaire-answering process using their remaining editing profile. We then proceed to put them in batches of 500 users each, and within each batch we construct a graph in which the nodes are users and the weight of the edge between two users is equal to the dot product of their question-space profiles. We then perform matching using the Hungarian algorithm, and then we measure the similarity of the 20 held-out documents for each user to all the documents edited by the other, and taking the average of these similarity values, we obtain a score between -1 and 1, which indicates how similar this pair of users are. We then compare these results to the result of a random matching among the same set of users.

Online Experiment[edit]

The online experiment is designed as follows:

  1. The experiment begins.
  2. For a week (or if fewer than 50 responses are gathered in the first week, for two weeks), when a new users joins Wikipedia, within 24 hours they receive a message informing them that they have been chosen for an experiment. They are given a link to the questionnaire, which involves the questions, the respective word clouds of every article set, and clickable links.
  3. If the user chooses to answer the questionnaire, they will have to answer every single question in it. At the end of the questionnaire, they provide us with contact information in the form of an email address. Their answers and email address are saved.
  4. Once a sufficient number of answers have been collected or time is up, the matching process begins as follows[3]:
    1. A user graph is formed, in which each user is a node, and the weight of the edge between two users is the dot product of their question-space profiles.
    2. Two matching algorithms are run over this graph: one maximal (i.e. covering all nodes) maximum weight matching, and one random matching. Extra care is taken in order to avoid having any edges that exist in both matchings. That is to say, each user is matched to two different users through these two matching schemes.
    3. Each participating user is informed about their two matches by an email.
  5. The users are then expected to gain information about their two matches by conversing with them. The email informing them of their matches will also contain a link to a survey, which is meant to get their opinion on their match, both on how much their interests seemed to be overlapping, and on what they think of collaborating with the matched person. In order to make sure that they do actually talk to each other, they will be required to agree on a passphrase, which they will have to provide in the survey that they fill. The survey responses of a pair of matched users will only be kept if their passphrases match. They will be given one week to do this and answer the survey, but if time permits, we may send them a reminder in a week if they don't respond, thus extending their time to two weeks.
  6. The answers to this survey (both the quantitative answers and the free text answers) will be used to compare the random matching scheme to the question-based matching scheme. We will also investigate the textual answers given by users to gain more insights, especially since the number of participants is expected to be low.

Several aspects of this scheme have been discussed, and the results are as follows:

  • The random baseline and rock-bottom: We are concerned that a random baseline might be quite easy to beat, and thus not very informative. Therefore, an idea would be to instead provide a user with three matches: one random, one in the aforementioned graph with the 20 questions, and another based on a similar graph, but in which the edge weights are calculated only using the first 10 questions (i.e. the latter half is thrown out). This, however, would require more participants, so in the interest of keeping things simple, we will not perform this experiment unless we significant participation is achieved.
  • What about questionnaire length?: An important aspect of the system is the length of the questionnaire. More questions mean more (and more fine-tuned) information, but more user boredom and frustration. However, that is a separate question which is not in line with the rest of our online experiment, and we have decided against including it.
  • It may take more than two to tango: With newcomers, diversity of ideas and teamwork are key, not to mention the fact that there will inevitably be some attrition. Therefore, it might be a better idea to match them in groups. However, optimal team-building is a topic that would significantly complicate our method, and would be outside the scope of this specific research project. Therefore, the current decision is to stick to pairs for matching.

The following sections describe the messages that will be sent to users, the contents of the questionnaire, and the survey.

The starting message[edit]

This is the first message that newcomers will receive within 24 hours of signing up for Wikipedia.

Greetings, fellow Wikipedian!

We are glad that you have decided to join this great community, and first and foremost we would like to welcome you!
There are many communities of editors on Wikipedia, who collaborate to make Wikipedia more comprehensive and more reliable in terms of content. However, we have noticed that the act of creating or entering communities tends to be challenging for newcomers. Whether or not you know what topics you'd like to contribute to, it may be difficult to find like-minded people with whom you could collaborate. This is why we, in order to facilitate this process, have created a questionnaire, consisting of 20 questions about Wikipedia articles on different topics. The questionnaire can be found at INSERT LINK HERE, and if you choose to answer it, we may be able to find people with similar interests to match you with, in order to get you started. Our questionnaire is currently at an experimental stage, so after matching you with one or more potential partners, we will ask you to converse with each of said partners, and then answer a survey about them and thus give us feedback on the match that we have provided for you.

There are several important points regarding the questionnaire:

#It will involve 20 questions, all of which we kindly ask you to answer.
#In order for you to converse with you potential match(es), we need to ask you to enter your email address at the end of the questionnaire. By providing your email address, you agree to have it shared with person(s) you get matched to.

We would like to thank you for your attention, and we hope that you will participate in our experiment, so that we may make Wikipedia a more welcoming place for any and all who wish to contribute to this worldwide vessel of knowledge and learning. Please accept our best regards.


This message will be sent to users within 24 hours of their signing up. It does not matter whom exactly it goes to, as long as there are many people who receive it, because it is reasonable to assume that many will ignore it.

The questionnaire[edit]

An example of one question in the prototype questionnaire method.

The questionnaire will be structured as 20 questions in a sequence. Each question is as in the example picture to the right: the set of top 20 documents of the topic on the left side as set A, the set of bottom 20 documents of that topic on the right as set B, and three choices: set A, set B, or neither. Both sets of questions are accompanied by their word clouds.

A JSON file containing the 20 questions may be found here. You can copy and paste the contents into an online tool such as this to view them in a structured manner. The questions are indexed by their numbers (stringified 0 up to and including 19), and each question has two fields "top" and "bottom", each of which contains 20 articles names. The questions are not displayed on this page because the JSON format is much more practical.

The word clouds can be found here in zipped format. The word cloud file for the top 20 documents in question indexed 5 (i.e. the 6th question since the list is 0-based) is called 5_top.png, and the word cloud file for the bottom 20 of the same question is called 5_bottom.png.

At the beginning of the questionnaire, the following message is displayed:

This questionnaire contains 20 questions, which attempt to capture your topics of interest, based on which we will match you with similar newcomers. A few important points on the questionnaire:

* Please have cookies enabled. If you do not have them enabled, please enable them and refresh this page.
* Each question is of the form "Set A or Set B?", the available answers being A, B, or Neither. Only a single answer is possible for each question.
* Please answer all 20 questions. Because we're evaluating our system based on this, we need full information on how users interact with each question. The form cannot be submitted without having chosen an answer for each question.
* Once you have answered all the questions, we kindly ask you to also enter your email address so that we can get back to you with your matched person.

Once you have submitted the questionnaire, your answers are recorded, and we will get back to you soon to inform you about the users we have matched you to. We will send you further information through the email address you provide to us at the end of the questionnaire.

Thank you for your participation!

Of course, the message assumes that recording the results will require cookies, because it does in the code I've written. That part of the message would be subject to change depending on how the webpage operates and whether or not it requires cookies.

At the end of the questionnaire, the following message is displayed:

Thank you for taking our questionnaire! We would have loved to offer you another cookie, but it's one per person.
We will get back to you with your matched person in the near future!

The matching email[edit]

Once the period of time for filling the questionnaire is over and question-based and random matching are performed, each person will receive the following email:

Greetings, dear participant!

You are receiving this message because you opted to participate in our questionnaire experiment on Wikipedia. We are glad to inform you that we have found you not one, but two matches! These two potential partners we have found for you are the following: USER 1'S EMAIL, USER 2'S EMAIL.

Now, we would like to ask you to provide feedback on them, separately. Imagine that you will form a separate two-person team with each one, and not a joint three-person team. Therefore, we kindly ask you to converse with each of them separately, and to figure out how overlapping your interests with that person are, and whether or not you'd be interested in collaborating with them. In addition, we'd like to ask you to agree on a passphrase with each of them. Afterwards, we ask you to fill this survey (SURVEY LINK HERE) for each of them to give us your feedback about the match, and provide the passphrase agreed upon with the person in question.

We would like to thank you in advance for getting in touch with your matched partners and filling the survey within one week. Your assistance in our efforts to make Wikipedia a more welcoming place are greatly appreciated, and we wish you a great day (or night, depending on where you are on Earth)!


In this email, the two emails of user 1 and user 2 should be presented in random order, such that their order does not say anything about which one was the question-based match and which one the random match.

The survey[edit]

The survey, whose link will be provided to the users in the matching email, is as follows:

Greetings, dear participant!

Thank you for participating in our experiment! We are looking forward to hearing your feedback on each of the people you were matched with, and have prepared a set of questions for you to answer about them. We kindly ask you to fill this survey once for each person you were matched with. At the end of the survey, please enter the passphrase you have agreed on with said person. Please bear in mind that for verification reasons, we need the passphrase provided by you and said person to be the same.

* Please enter your own email address here.

* Who was the person you were matched with? Please enter their email address.

* Based on your interactions with this person, how much do you agree or disagree with the following statements? Please also provide a one-line description for your answer.
# This person is interested in editing similar topics to me.
# This person is enthusiastic about editing Wikipedia.
# (If your answer to the previous question is affirmative) This person would be willing to edit Wikipedia articles with me.
# I would enjoy learning from and collaborating with this person.
# Based on your interactions with this person, what additional observations do you have about this person as an editor and collaborator?

* Please enter the passphrase you have agreed on.

We thank you again for your participation! With your help, we can make Wikipedia a more welcoming environment for newcomers. Thus, your help will contribute to the core strength of Wikipedia - its volunteering participants.


The first four questions have the user give their opinion as a rating from 1 (strongly disagree) to 5 (strongly agree), and optionally provide a one-line explanation as to what influenced their opinion. The final question is an attempt to elicit any additional feedback that we may have missed with the first four questions, and is a free text field. The person's email is necessary for identification, and we require them to enter the email address of their match as well, in case they use the same passphrase for both of their matches.

External links[edit]


  1. Hu, Y.; Koren, Y.; Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Eighth IEEE International Conference on Data Mining (ICDM). doi:10.1109/ICDM.2008.22. 
  2. The latest dumps can be found at:
  3. If there are enough responses, we could exclude some users from the matching scheme, and then track the two groups (those who were matched to someone and those who weren't) as part of a longer experiment to measure the effect of community formation. However, this is a separate experiment and for this reason, we have decided against doing so.