Research talk:Voice and exit in a voluntary work environment/Elicit new editor interests

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Tool and Survey[edit]

@RYazdanian:

  • I checked on our end and it's best if we bring up the tool on EPFL end. Please let me know what user information gets collected once the user interacts with the tool on your end and how long you intend to keep that information? Just list out the items that will be registered. :)
  • Please bring up the tool on your end, and let's make sure it's up in the coming few weeks.
  • Can create the survey forms corresponding to the survey(s) we want to put in front of the users? We're close to communicating this with enwiki and it's good to have everything read.

Thanks! --LZia (WMF) (talk) 21:29, 5 June 2018 (UTC)

@LZia (WMF):

  • The information collected are the answers (1, 0 or -1 for each of the 20 questions) and the email address. The email address will be kept until the end of our study, because at the end, when verifying the survey responses, we need to know who was matched to whom (in order to check their magic phrases). They will of course be deleted afterwards.
  • It'll hopefully be up within a couple of days, I'll ping you once it's done.
  • Will do!

--RYazdanian (talk) 07:37, 6 June 2018 (UTC)

@RYazdanian: Thanks for the responses. One question about information collected. I would assume that the server would collect ip_address and user_agent, plus some other items. Isn't that the case? (If it doesn't, of course, it's much easier to communicate. If it does, we may need to explain how long that data will be kept, etc.) --LZia (WMF) (talk) 18:27, 6 June 2018 (UTC)
@LZia (WMF): There is a cookie involved (which essentially does nothing, it used to do something when we had two sets of questions but now there's just one), and nothing else. We don't keep any data other than the email they provide to us and their answers. --RYazdanian (talk) 20:51, 6 June 2018 (UTC)

@LZia (WMF):
All right. The website is up, here: http://34.245.220.212:5000/
Also, here is the survey: https://docs.google.com/forms/d/e/1FAIpQLSdXfeIikHrf3h0qD-KP8z7PUU576hJgKb0mDZmlkQrR16ewLQ/viewform?usp=sf_link
Let me know if there is any other information you need. The information collected in the questionnaire is stored on the Amazon server for the website, and the survey responses in Google Drive; both sets will be destroyed upon the completion of the research project.

Feedback on Online evaluation[edit]

@RYazdanian: Thanks for expanding the online evaluation section. Some thoughts:

  • Can you walk me through why you chose to have word clouds and not just the list of articles in each set? The word cloud representation relies on the specific algorithm used to generate word cloud and I'm a bit concerned that the changes in the word cloud generation can give you different responses by the user especially as these word clouds are very catchy when compared to the plane list of articles. I wonder if you gain more by "forcing" the user to go through the list of articles instead of seeing a very high level summary via word cloud.
  • "The random baseline and rock-bottom": I'm not sure if you will get enough responses to be able to do the 20-to-random versus 10-to-20 comparison. Let's start with 20-to-random and let's get to 10-to-20 if we see we're getting a lot of responses. What do you think?
  • "Who are the participants?" Given what we discussed and you've listed in this item, I would highly recommend you focus on newcomers. The experienced editor case is much harder to design for to address all issues raised.
  • "What do we want to ask?": Please lead us to finalize this item. There are a few candidates as we discussed. You should just push us to decide. ;)
  • "What about questionnaire length?": I would drop asking the participants what to ask for this first round. We should keep it simple.
  • "It may take more than two to tango": I agree with what you say, including the fact that we should not get into the details of it now and as part of this first study.

--LZia (WMF) (talk) 22:28, 14 May 2018 (UTC)

@LZia (WMF): Re: word clouds: The word clouds are simply generated from TF-IDF vectors of the 20 articles, being the 20 highest weighted words when summing up said vectors of those 20 articles. The reason we came up with these was that some article sets (in different questions) seemed to be very similar, but their word clouds clarified their differences. For example, in two questions that each involved a set of historical articles, we realised that one was more about colonialism and another more about war.

Re: random baseline: Yes I think that's a good idea. So you don't think matching each person with three others is a good idea, right?

Re: participants: Yeah I guess we'd better just go ahead with newcomers or at least inexperienced users, and we will put them in touch with each other using their email addresses.

Re: questions in the survey: Let's discuss this over the email thread then. :)

Re: questionnaire length: Very well.

Re: tango: All right, so we will just stick to pairwise matching.

--RYazdanian 08:04, 15 May 2018 (UTC)

@RYazdanian: I see what you say re word clouds point. Given that your set of questions is fixed, you can actually check the word clouds to make sure each set is sufficiently different from the other one per word cloud which is good. You may want to consider having them hidden and the user being able to expand and see them if needed. I personally cannot look at the list of articles when I see a big tempting cloud like that. Not sure if this is a behavior that other users have as well. Re random baseline: yup. I think matching 1 to 3 will make it too complex. Let's run a first experiment and see what kind of response we get. We can always try to run a second one if needed. --LZia (WMF) (talk) 12:51, 15 May 2018 (UTC)

How to increase the number of email recipients?[edit]

@Bmansurov (WMF): @RYazdanian: Looking at the stats about the experiment so far and that we have only received 21 responses so far, as well as the way Email this user feature option gets activated for newcomers, here is what I recommend we change in the way we select people that we send emails to:

Currently, the condition for a user to receive an email is:

  • The user has registered in the 24 hours prior to the point where we check the system. (Checking happens once every 24 hours.)
  • The user has done at least one edit in that 24 hours.
  • The user has activated "Email this user" feature, which essentially for newcomers means they have confirmed their email address via a link that has been emailed to them.

Using the above criteria, we are reaching out to only ~20% of the newcomers. This is too low (especially in light of the low response rate we're receiving), and we can change it by relaxing two of the criteria above:

  • For every registered account, let's make sure we check the user up to 24 hours following opening the account for counting the 1 edit. This assures that we don't lose some users who have registered too close to the time that we check the system.
  • For every username, give a longer time to activate their account. Consider all users who have started registering since 2018-08-05 but then still consider them throughout the experiment period no matter when they have confirmed their email address and as long as they have done 1 edit in the first 24 hours.
@LZia (WMF): I think those are great ideas, especially the 2nd one. Maybe we could also extend the editing deadline to 48 hours or something like that? That aside, I think we should extend the experiment to about 2 weeks anyway. I'd say we should aim to have at least 100 participants before we proceed with the matching. --RYazdanian (talk) 19:31, 8 August 2018 (UTC)
OK, I'll send out emails according to the above points. Since it requires some work, I'll do so starting tomorrow. Today, I'm going to continue what I've been doing the last two days. --Bmansurov (WMF) (talk) 21:14, 8 August 2018 (UTC)

@Bmansurov (WMF): We may be able to relax the 1-edit-in-the-first-24-hour-period constraint further. Can you plot the distribution of time to first edit for people who have registered an account in the past week/month? --LZia (WMF) (talk) 19:20, 8 August 2018 (UTC)

@LZia (WMF): Here's some data about time to first edit: https://phabricator.wikimedia.org/T190776#4489859 --Bmansurov (WMF) (talk) 21:01, 8 August 2018 (UTC)
@Bmansurov (WMF): This is very helpful. Thanks. Given this, can you extend the period you consider for edit activity to 3 (or even 4) days after opening an account? Please also extend the account confirmation period to a week after opening the account (or if there is a way to see when an account was confirmed, we can look at time to email confirmation as well and have a more educated estimate, but I'm fine with a rough one, too.) --LZia (WMF) (talk) 23:22, 8 August 2018 (UTC)
@LZia (WMF): I'm not taking account confirmation date into account now. I'm looking at the registration date only. I'll look at everyone who has registered since 08/05/2018 and send an email if they have made an edit in the last 3/4 days after the registration. How should account confirmation fit in here? --Bmansurov (WMF) (talk) 15:46, 9 August 2018 (UTC)

@RYazdanian: please update the daily stats with the number of responses we have received up to that point (roughly, it doesn't have to be to the hour), so we have that number out there as well. Thanks! --LZia (WMF) (talk) 23:15, 9 August 2018 (UTC)

What do we ask the participants of the questionnaire?[edit]

@RYazdanian: Can you check if you can produce statistics on how many times someone (a cookie?) makes it to the questionnaire and doesn't participate in the questionnaire? I re-reviewed the questionnaire and I /think/ the question we're asking may not be very clear to the users. Do you think it's worth spelling out in front of every "Question x" the exact question the user has to respond to? Something along the lines of "Question x: Review the list of articles in sets A and B and tell us which set of articles is closer to the kind of articles you would want to contribute to. If neither set captures your interests, please choose Neither in your response."

I think we have to make it super clear for people as what to do. Think about the above for a bit, I bet you can provide a better statement that describes the exact question we want to ask users to answer. (I won't be in front of the computer for long today PT, please let's assume that we can converge by Sunday morning PT.) --LZia (WMF) (talk) 17:07, 11 August 2018 (UTC)

@LZia (WMF): Sounds like a good idea. I believe I could count the total number of requests made to the page and compare it with the number of responses that we have to get an approximation; I'll let you know. I've also added that clarifying text to each question.
Meanwhile, we now have over 170 responses, thanks to the massive number of people that Baha sent emails to yesterday. This (re-)raises the question: when do we stop this phase and go into the matching?
--RYazdanian (talk) 17:22, 11 August 2018 (UTC)

Just a heads up[edit]

Hi, I've got an invitation to participate in the research, but the link supposed to point to privacy statement is pointed to a non existing page (atleast content wise): https://foundation.wikimedia.org/wiki/Elicit_New_Editor_Interests_Survey_Privacy_Statement%E2%80%8B (atleast as rendered by protonmail.com). So that was a red flag for me and combined with non https, ip only address with exotic port number it was not something that I would click right away. You know, scams,viruses and such. However I've done some websearching and found this page. Everything seems legit and I would try to participate as best as I can, but maybe the link and the other stuff have something to do with the number of the people that respond, a thing you have discussed above. Just letting you know. All the best InsomniHat (talk) 19:40, 12 August 2018 (UTC) P.S. Just clicked the survey link and got not found error. InsomniHat (talk) 20:03, 12 August 2018 (UTC) P.P.S. I have fixed the survey link manually and it works, but the clickable link in protonmail webmail leads to 404 not found. Maybe other mail clients also render it wrong for some reason. Here it is, for a reference: http://34.245.220.212:5000/%E2%80%8B Same goes for the privacy statement - fixed it manually, but bad clickable link in protonmail. Hope that helps InsomniHat (talk) 20:03, 12 August 2018 (UTC)

@InsomniHat: Thank you for your note, and our apologies that you had to go through multiple steps before you get here, or before you could participate. We put an invisible space between the URLs and "." at the end of each sentence. Gmail clients handled it fine, but there is a percentage of clients that didn't handle it correctly. Since some hours ago, we have fixed this issue so the new emails sent should have the correct URLs. We didn't want to re-email everyone to send the fixed link just to avoid sending too many emails to people. Thank you again, and I'm happy that you will participate in the study. Regards. --LZia (WMF) (talk) 04:26, 14 August 2018 (UTC)

Matching survey[edit]

@RYazdanian: (No rush) when you get a chance, please update the survey with a few information: which fields were made required, the categorical options given to users (range), and any changes in the text of the survey we implemented before pushing it out. Thanks! --LZia (WMF) (talk) 22:24, 23 August 2018 (UTC)

Let's iterate on the matching experiment[edit]

@RYazdanian: As discussed in various threads, we are ready to conclude that the second stage of the experiment which required editors to get back to us via the survey and let us know about the quality of matches failed (7 responses so far). This is of course sad but it's fair to say that we have learned quite a bit in this stage and we should start brainstorming about the next iteration and how to get it right (or more right;) there. I'll start by capturing a few types of responses that we received after we sent out the reminder email to users (remember that in each category we had 2-3 emails, not so many):

  • A subset of the users contacted us saying that they have contacted the other user but haven't heard back.
  • A subset said that their first match that they contacted was not good and they decided not to follow up on the second match.
  • A subset contacted us saying that their account was blocked and they couldn't take action.
  • A subset said that they contacted the other editor and the editor was not nice.

I had an in-depth conversation about this with Jonathan, and another one with Bob, and here is where I'm at the moment. Please be aware that this needs your extensive input and is by no means done:

We generally think that acting on a matching task was complicated for editors. So, let's drop the idea of matching for now and think about another model. Suppose that we ask a fresh set of newcomers to fill out the questionnaire. For this set, we give them the list of top 20 articles that we would recommend to them based on their responses to the survey and we ask them to rank the articles. On our end, we can also rank these articles based on highest average similarity or some other metric. We believe that the ranking task is much simpler: no coordination with someone else is needed, and it's less time-consuming. We need to make sure what goes to the ranking task is really diverse though and we should perhaps select this list semi-manually using a taxonomy of Wikipedia articles.

@SalimJah: @Bmansurov (WMF): @Jmorgan (WMF): @Cervisiarius: @MMiller (WMF): FYI.


@LZia (WMF): Yeah it's kinda sad, but at the same time it's quite interesting to learn what's needed to engage the newcomers to stay and become active editors. In a certain way, it was an interesting failure.

We certainly should change the recommendation type to recommending articles to edit, and I also think that Jonathan's idea of giving them small tasks such as adding links to articles is a great way to measure how much we actually retain them by giving them tasks in line with their interests, especially now that we have plenty of time. In order to maintain this advantage of time, however, we need to expedite the commencement of the experiment: starting before the end of October would be ideal.

Now, let's discuss some of the details:

  • I think the idea of having a pre-set, limited list of articles that we ask users to rank, and then also rank ourselves using both questionnaire results and also randomly, is easy for us to implement, but also unrealistic. What this experiment would measure, is how well responses to our questionnaire can help rank a small set of articles, but in reality, we would want to be able to recommend almost any article, which requires the ranking of a much larger set. In addition, our offline testing shows that our method significantly beats random recommendation, so there's pretty much no competition there and we don't need to have that tested online.
    • However, there's one possibility: if this questionnaire's answers are not meant to be used directly, but as an input to another system, it is possible to go with a limited set of articles (that also need contributions) for now, arguing that we're just testing the effectiveness of the questionnaire at engaging people to make edits. I still think we could do more than that, though.
  • The other option is of course to recommend articles directly. This is good since it's realistic and performs well offline, although I still have to test its recommendation diversity. However, there's a small catch: in order to clean up the space of documents and reduce noise in our questionnaire, we filtered out a myriad of documents, most importantly list pages (e.g. List of North American birds) and stub articles. Therefore, our article space is considerably smaller than the whole space (it has about 300K articles), which means that we're naturally missing out on many articles one could edit. This filtering is very important to the quality of the questionnaire, but it does complicate the question of proper article recommendations.
    • One idea that could fix this would be to avoid directly recommending articles, and to instead search for existing users whose recent edits match this user's questionnaire responses. Then, their latest edited articles could be recommended as an indirect recommendation. This, however, would require us to basically re-run the entire pipeline (effectively changing the questionnaire) every now and then, to make sure the "recent" edits are really recent.
    • Another idea would be to use the categories of recommended articles to find similar articles in need of contributions. Or to just have a separate article similarity ranking for that purpose.

Your input is welcome regarding both of the options above.

Another question we have to tackle is how the control condition is to be devised. One idea would be to recommend 5 articles using their questionnaire responses and 5 randomly, and then measure their satisfaction and later interactions with either set. This would only be compatible with the article recommendation idea, since in the ranking idea, the set of articles that they rank remains the same.

One final point: In order to increase user satisfaction, I think it would be a good idea to have two preference levels: instead of choosing set A or set B, they would say 'Greatly prefer A over B' or 'Slightly prefer A over B', etc. This would increase the number of possibilities for their responses and help them specify their interests more accurately.

--RYazdanian (talk) 21:01, 15 September 2018 (UTC)