Talk:Community Liaisons/Product Surveys
Add topicRequirements
[edit]In determining which system to run a survey with, we reviewed against the following criteria:
- Multiple language support
- Supports segmentation (by tenure, edit count, etc)
- Open source
- Allows contribution
- Ranks/tallies ideas
- Must hold many ideas
- Robust against gaming
- Proven scaling and support (who fixes it when it goes down?)
- Compliant with the WMF privacy policy ideals (though it is important to remind participants that they are going to another system)
Candidates
[edit]We reviewed the following systems against the requirements criteria:
Who chose this?
[edit]Who chose this frankly idiotic way of gathering feedback? No way of prioritizing requests. No links to get more info about each tool. No indication of what would be improved for each tool. No way to enter what should be improved for each tool. Just click, click, click. If you're going to use this technique for the larger survey stop wasting money and just write the tool names on slips of paper and pull them out of a hat. --NeilN (talk) 23:22, 4 December 2014 (UTC)
- Hi NeilN - well, I suppose your feedback is clear in that this is frustrating for you to use! There are some definitely limitations to this tool so let me know if I am getting your feedback correct: you don't like that there is no hyper-linking and that there isn't enough space to clarify aspects of the gadgets and tools? As for the clicking, that's an algorithmic system designed to rank and weight the ideas against one another in a way that supports better data with large groups of people. I see you revert vandalism with Twinkle as per your en.wp profile, so my guess is that you would probably like to see that improved/expanded. I was responsible for choosing the tool, but I'm open to productive ideas on possible alternative data-retaining systems that get us to the same result. --Rdicerb (WMF) (talk) 00:06, 5 December 2014 (UTC)
- @Rdicerb (WMF): It's not frustrating, it's too simplistic. I'm given a series of choices between two options. I have no idea if you're doing this "Survivor style" where the "loser" gets kicked down a notch for my session or if the two options are always random. I have no idea what you're offering to do with each tool. I might use Twinkle a lot but if all you're thinking to do is this, I might opt for more useful proposed changes in contribution history (heck, right now I'd vote for a stable contrib tool). I have no way of telling you what I want to see changed in various tools. We all realize you have limited resources. You need to get the best data you can to help decide how to allocate these resources. All this gives you is a bunch of uncontrolled numbers (what if the survey taker quits after the first question?)
- Weighting the ideas is a good idea and one I would expect to see implemented by straight ranking (here are ten options, rank them from one to ten) or by allocating points (allocate these fifty points however you see fit). I would also expect a brief description of what changes we could see for each tool and a place where we could enter our wishlist for each tool. User wishlists could be used to update "what changes we could see for each tool" on an ongoing basis. If you were really ambitious, implement a modified SecurePoll used in the Arbcom elections where editors could log in, view, and change their weights if they saw a new change for a tool that they really wanted. --NeilN (talk) 03:42, 5 December 2014 (UTC)
- The entire process here is prioritizing requests: Do you choose this tool over that one, or the other way around? The two options are always random, and the results are combined (using a formula very similar to the logic problems that run something like "If Alice is my mother's granddaughter, but I am not Alice's mother, then who am I?") to figure out your personal ranked list. It isn't necessary for you to be confronted with a list of two dozen options on one screen for the database to end up with a ranked list. If the survey taker quits after the first question, then that's fine: that short ranked list (of two items) gets recorded and counted (to determine the overall priority of those two items only).
- There is no further detail about exactly what improvements could/should/would be made, because the the goal here is very limited: We want to know which tools are more interesting to editors. Providing details about exactly what improvements could be made would probably require one person to spend several months on making the list—and 90% of it would likely be time wasted, because 90% of these tools won't be improved any time soon. The likely result is not "This tool was #1, so the devs will go do whatever occurs to them about this tool". The likely result is "Here are the top two or three. Let's talk about how to improve these, and then see what we can achieve together". Whatamidoing (WMF) (talk) 14:28, 5 December 2014 (UTC)
- Whatamidoing (WMF), pardon my French but once again, the WMF is doing things half-assed. It seems you're asking editors to choose what is their favorite tool, not the one that they think needs the most improvement. And your assertion about the resources needed to make improvement lists once again shows the WMF has little clue about how to engage its communities of editors. Why would a WMF staffer do all this work? Each tool has a discussion page. Simply post a note there asking for a concise list of needed improvements and get editors to do the work. If no or little feedback is forthcoming then that also gives you an indication of the level of interest in improving that tool. So, a couple hours to post notices and maybe a couple hours for each tool to organize the feedback. --NeilN (talk) 15:58, 5 December 2014 (UTC)
- Re: the survey/system chosen: I'm personally not a fan of the interface (I'd prefer a straight ranking system, too), but some people do like this pairwise comparison system because of its simple interface. There's a thread on wikimedia-l with 2 posts giving positive feedback (and more on IRC and via private email). As I said elsewhere, it's definitely not a perfect system, and certainly won't suit everyone, but getting feedback from 10,000–100,000+ people on anything is fraught with complications, conundrums, and compromises. :-( We're trying out this tool, and are learning its pros and cons. Like much in life, it has many of both. (I like the list of features you suggest for an ideal survey/feedback/wishlist-collation tool. But developing that would take time away from everything else... :/ )
- Re: improving the existing tools: One of the potential end-results, is to convert a gadget or labs-tool into an extension, thereby making it more robust, and giving better access to it at all our other wikis, and more thorough internationalisation - e.g. mw:Extension:MassMessage was an overhaul of the Edwardsbot concept, and mw:Extension:GlobalCssJs was an overhaul of the Synchbot concept.
- Alternatively, the survey could help us (all) determine which tools would most benefit from the smaller/local improvements, as you suggest. Or point in other directions entirely.
- HTH. Quiddity (WMF) (talk) 22:18, 5 December 2014 (UTC)
- Neil, I don't know what your experience is, but it's my experience that when no feedback is forthcoming about a tool, it means only that the maintainer is inactive, not that people don't want to use it. Also, as Quiddity says, the outcome need not involve changing the tool from the editor's perspective. It could mean expanding the number of people who have access to it, or re-writing it so that it's more performant. Whatamidoing (WMF) (talk) 04:18, 8 December 2014 (UTC)
- Whatamidoing (WMF), pardon my French but once again, the WMF is doing things half-assed. It seems you're asking editors to choose what is their favorite tool, not the one that they think needs the most improvement. And your assertion about the resources needed to make improvement lists once again shows the WMF has little clue about how to engage its communities of editors. Why would a WMF staffer do all this work? Each tool has a discussion page. Simply post a note there asking for a concise list of needed improvements and get editors to do the work. If no or little feedback is forthcoming then that also gives you an indication of the level of interest in improving that tool. So, a couple hours to post notices and maybe a couple hours for each tool to organize the feedback. --NeilN (talk) 15:58, 5 December 2014 (UTC)
- TL;DR this whole thing, but I will say that since this "tool" for gathering data has no limit for the number of times you can "vote" and offers the same options over and over, it is not all that difficult for one user to write a small javascript to click for them and sway the votes in the favor of what they want instead of the community. That fact alone makes this type of survey entirely useless in my humble opinion. I personally would prefer to see a list of all the options where users can either rearrange them in order of importance to them or assign importance using drop boxes or radio buttons or something. I also think that clicking on the one liner for each item should offer a summary of the details of what that particular tool is and what the WMF would be doing to or with it in the lower half of the screen in a div or iframe of some sort. I like the initiative of the foundation to do this, I just think your prototype feedback gathering system is seriously flawed and needs to be fixed. :) {{U|Technical 13}} (e • t • c) 15:20, 11 December 2014 (UTC)
- See the main page: finding out whether someone can "cheat" with this is among the goals of the pilot ;) I don't think it's how many times you vote which matters. If you prefer Tool A, and will always click on its button when it shows up, it doesn't matter if you do so 1, 3, 100 or 1000 times, since you're still giving it 100% of your preference. From the explanation of the scoring on OAI: The score of an idea is the estimated chance that it will win against a randomly chosen idea. For example, a score of 100 means the idea is predicted to win every time and a score of 0 means the idea is predicted to lose every time. --Elitre (WMF) (talk) 11:47, 12 December 2014 (UTC)
- Surely it does matter how many times a respondent votes: the more votes giving Tool A total preference, the more its total score will tend toward 100%, regardless of whether the votes come from one user or many. You didn't need a pilot study to work that out. A strange way to run a survey!Noyster (talk) 12:43, 15 December 2014 (UTC)
- It's not counting votes. The total score for "Tool A" does not matter. What matters is the score for "Person A". The underlying calculation is "What is your personal preference?", rather than "What are the preferences related to Tool A?". I've linked the paper below, if you want to read through 20 pages of mathematics about how it works. Whatamidoing (WMF) (talk) 18:40, 16 December 2014 (UTC)
- Elitre, the survey is anonymous, you don't have to log in to vote, so how can you know whether I voted 1 time or 1000 times for one idea? Fram (talk) 14:38, 16 December 2014 (UTC)
- You can read more about the methodology here if you want, but the oversimplified answer is that it's really, really easy to figure out whether those 500 responses were from the same source or from 500 different sources. "Source" means an identiifiable computer/browser/IP address, not a human; it cannot distinguish between one person answering 500 questions and 500 people answering one question each on the same computer (for example, if you took a computer to a public park and asked random passersby to answer the question).
- To answer each possible pair, you would have to answer more than 1,500 questions. During this process, your "one idea" would only come up about 40 times. So to answer "1000 times for one idea", you would probably have to sit through about 25 different three-hour-long sessions (be sure to clear cookies, clear cache, and change your IP in between each, or, better yet, switch to a completely different computer in a different location). This would probably be the equivalent of three weeks of full-time work—and even then, your "1000 times for one idea" would only be counted as 25 people/25 computer sessions, not as 1,000.
- The thing that may be confusing is that it's not counting votes. You can't "vote 1000 times for one idea", even if you spent all week clicking buttons. It is calculating the probability that you prefer item #1 over item #2, the probability that you prefer item #1 over item #3 (etc.), and then separately calculating the probability that I prefer item #1 over item #2, the probability that I prefer item #1 over item #3 (etc.), and then taking all of these individual preferences and calculating the probability that a hypothetical "average" person would item #1 over item #2, item #1 over item #3, etc. The more pairs you respond to, the clearer idea it has of your preferences, but answering more does not actually make you get more say in the final results. Your responses are still just one person's (or, at least, one computer session's) responses. Whatamidoing (WMF) (talk) 18:16, 16 December 2014 (UTC)
- Your math is way of though. With 40 issues, you have 39 + 38 + 37 + ... + 1 pais, or some 800, not 1,500 questions (every pair is only one question of course, and the order of the issues in a pair is not relevant!). Fram (talk) 13:23, 17 December 2014 (UTC)
- You're right; I counted A vs B as being different from B vs A, which it isn't. So it should be 780, or a week and a half of full-time work. Whatamidoing (WMF) (talk) 02:00, 18 December 2014 (UTC)
- Your math is way of though. With 40 issues, you have 39 + 38 + 37 + ... + 1 pais, or some 800, not 1,500 questions (every pair is only one question of course, and the order of the issues in a pair is not relevant!). Fram (talk) 13:23, 17 December 2014 (UTC)
- Surely it does matter how many times a respondent votes: the more votes giving Tool A total preference, the more its total score will tend toward 100%, regardless of whether the votes come from one user or many. You didn't need a pilot study to work that out. A strange way to run a survey!Noyster (talk) 12:43, 15 December 2014 (UTC)
- See the main page: finding out whether someone can "cheat" with this is among the goals of the pilot ;) I don't think it's how many times you vote which matters. If you prefer Tool A, and will always click on its button when it shows up, it doesn't matter if you do so 1, 3, 100 or 1000 times, since you're still giving it 100% of your preference. From the explanation of the scoring on OAI: The score of an idea is the estimated chance that it will win against a randomly chosen idea. For example, a score of 100 means the idea is predicted to win every time and a score of 0 means the idea is predicted to lose every time. --Elitre (WMF) (talk) 11:47, 12 December 2014 (UTC)
Re: handling translations with this external system
[edit]What system? Who translated and how? --Nemo 15:08, 5 December 2014 (UTC)
- It's mentioned above, AllOurIdeas. Translations need to be added manually by the person handling the current survey. As you can see here, and as the page explains, the language involved for the pilot is Spanish. HTH. --Elitre (WMF) (talk) 15:50, 5 December 2014 (UTC)
- Is there any way to added the translations? I see none when I did the survey and I must say, some translations are painful. Thanks. --Ganímedes (talk) 00:03, 12 December 2014 (UTC)
- You are certainly welcome to edit, like you did, Community_Engagement_(Product)/Product_Surveys/es and Community_Engagement_(Product)/Product_Surveys/Text_Translation/es. It will be useful for those who read these pages on wiki, although this tool has the limitation that it doesn't allow us to fix the translations we already entered in the tool, from what I hear. Thanks a lot for your help, --Elitre (WMF) (talk) 12:01, 12 December 2014 (UTC) PS: You can also keep an eye on Community Engagement (Product)/Product Surveys/Ideas so that if for any reason someone posts a suggestion in Spanish which could be improved, you could edit it there and your text would be the one we add to the survey.
- Is there any way to added the translations? I see none when I did the survey and I must say, some translations are painful. Thanks. --Ganímedes (talk) 00:03, 12 December 2014 (UTC)
Re: tables (i.e. copied from Excel)
[edit]Excel?!?!?!?!?!? Please don't advertise proprietary software in Wikimedia-published material! --Nemo 15:16, 5 December 2014 (UTC)
- Sorry, I copied the description from https://tools.wmflabs.org/hay/directory/ - Unfortunately the text cannot be changed once the survey is running (one of many frustrating limitations). I would tweak a number of things, if we could turn back time... :-/ Quiddity (WMF) (talk) 21:04, 5 December 2014 (UTC)
- Do you mean the Tab2Wiki tool, Nemo? The tool actually allows you to take something in excel and turn it into a wiki table as shown here. --Rdicerb (WMF) (talk) 02:43, 8 December 2014 (UTC)
- Maybe he's talking about the use of a not open-source software... --Ganímedes (talk) 10:15, 12 December 2014 (UTC)
- Probably; I think I fixed that where I could. If someone could translate "any spreadsheet application" in Spanish I'd fix the other occurrences on wiki at least. Thank you. --Elitre (WMF) (talk) 12:07, 12 December 2014 (UTC)
- Maybe he's talking about the use of a not open-source software... --Ganímedes (talk) 10:15, 12 December 2014 (UTC)
- Do you mean the Tab2Wiki tool, Nemo? The tool actually allows you to take something in excel and turn it into a wiki table as shown here. --Rdicerb (WMF) (talk) 02:43, 8 December 2014 (UTC)
Never-ending series of comparisons
[edit]It seems like the number of pairwise choices presented to me is infinite. I just keep clicking and clicking and clicking and clicking, not knowing whether an end to my participation actually exists or if I should just stop clicking because I'm bored with picking one over the other. I hope there is actually a defined end to the survey because it doesn't seem like you'd get good input otherwise. It would be helpful if respondents knew how many pairs they had left to evaluate and/or how long it takes to complete the survey. Ca2james (talk) 21:13, 7 December 2014 (UTC)
- You can see the complete list and view current results here - you may have already seen this, Ca2james, but I wanted to draw your attention to it if you have not. I'm going through currently added ideas and will be posting them tomorrow. The algorithm weights them so that they have a chance (even if added on the last day, so people can continue submitting ideas). --Rdicerb (WMF) (talk) 02:46, 8 December 2014 (UTC)
- It would have been much easier if the poll was a sortable list. It would take like two minutes to order on preferences. --NaBUru38 (talk) 19:58, 13 December 2014 (UTC)
- I agree with NaBUru38 and Ca2james. It is a torture to do this survey. Thanks. --Ganímedes (talk) 10:46, 23 December 2014 (UTC)
- It would have been much easier if the poll was a sortable list. It would take like two minutes to order on preferences. --NaBUru38 (talk) 19:58, 13 December 2014 (UTC)
The number of "All Our Ideas" seems to be changing
[edit]If I recall corectly, at the time I took this survey (and quit after a rather irritating amount of clicks as being rather useless, if I choose "neither of the two choices" then I don't want to have those choices again and again and again, I just don't want them) there were 23 ideas to choose from; now there are 40. While it is nice that suggestions get added (though how, when, ... is unclear; the suggestions seem o go to some black hole, it isn't indicated anywhere what happens to the ideas or where one can follow the discussion about them), it means that my vote is basically useless now, as I may have voted for things which are now IMO much lower priority (preferring some of the new choices), and have not voted for things I perhaps really like (because frankly, why would I go back to a survey I have already taken?). Why didn't you create a two-step system, first one to gather suggestions, and then a second to vote on those (with a better system than used here)? As it stands, I have no trust at all that this survey has any value whatsoever. Most of the above criticisms (e.g. the lack of explanation of what these tools do) also apply of course. Oh, and using a MDY format for the "added on" date is quite confusing (and parochial), why not use YMD instead?
As it stands, this looks like yet another "we consulted the community" excuse-building effort, and not a genuine attempt to have a useful and fair survey. I see above that the WMF claims that the tool is used to "figure out your personal ranked list.", which is quite amazing considering that one doesn't have to log in to use the survey. So, it's IP-based or what? That's not a "personal ranked list", that's an invitation for abuse. Fram (talk) 14:35, 16 December 2014 (UTC)
- Your response (not really a "vote") is not useless, even if you answer only one pair. Your answers indicate your preference for the items that you answered. I linked their research paper above, where they talk a bit about how they impute probable answers to items that you were not asked. To give a simple example, imagine that, in the early days of the survey, item #1 has the highest probability of being selected. In the middle of the survey, item #50 is added. If item #50 is being chosen over item #1, then item #50 is at the top of the list, and item #1 becomes second, even though many fewer people were asked about item #50. (This has actually happened in this survey. A few of the newly added suggestions are proving to be quite popular.)
- Suggestions are submitted through the survey, sent to Rachel every few days, and added (if relevant) by her in batches. There is no discussion about them, because the survey software is not a discussion platform. The list is posted at Community Engagement (Product)/Product Surveys/Ideas, along with her categorization. Anything that meets the criteria is added. Duplicates, vaporware, and off-topic comments are not. If you think she's made a clear error (for example, if something is marked as not being a specific (single and already existing) tool or gadget, and you know exactly which gadget is being talked about), then feel free to post a comment here (e.g., with a link to the gadget that you believe the suggester had in mind).
- We did talk about a two-step system. It's main failing is that it rejects good ideas that aren't submitted by some artificial deadline. Based on my experience with RFCs and policy discussions, Wikipedians don't like that sort of bureaucratic system at all. Whatamidoing (WMF) (talk) 18:32, 16 December 2014 (UTC)
- The present set-up may perhaps be justifiable as a pilot study, but when you move on to a full survey please do a meaningful one. (1) The same options should definitely be presented to all respondents: if this is "bureaucracy" we'll risk that. (2) There are now far too many options for this paired-comparison system to provide anything close to a respondent's preference ranking without a quite inordinate amount of clicking. Consider presenting each respondent with the whole list of options - with similar options consolidated - and ask for a ranking of their top 5 or top 10. (3) Respondents should be provided with a much better explanation of the survey, to inform them how it works, how the results will be reported and used, and what to expect before they begin. If possible please consult specialists in survey methodology before proceeding. Noyster (talk) 12:18, 17 December 2014 (UTC)
- It may not be useless to you, it certainly feels useless to me. And "There is no discussion about them, because the survey software is not a discussion platform." is an extremely poor argument. The discussion needs to happen before they are added to the survey (or rejected), not having a discussion has nothing to do with the survey software but is a choice by you (WMF). You are free to defend that choice, but please not with a "computer says no" cop-out. You don't want a two-step system because "it rejects good ideas that aren't submitted by some artificial deadline." Now your system rejects good ideas that don't meet an artificial criterion (only tools and gadgets, not things that e.g. only exist in Flow / VE / whatever and could be used in wikitext) and will still reject any idea submitted to late (the survey runs for two weeks, which means it will end today, but the latest question to be added only appeared yesterday and had only 29 completed questions so far; at least 6 ideas are still in "reviewing" mode, so won't meet your arbitrary deadline (2-week survey, remember) anyway.
- Please, just have a two-step survey of which concrete improvements are most wanted, no matter if they are tools, gadgets, bug fixes, or brand new ideas. It may take more time and work, but it will be a lot more useful than this very limited and strangely set-up survey (with dubious results, see below). Fram (talk) 14:06, 17 December 2014 (UTC)
- The question that the WMF is asking is not "Anything goes! What do you want?" The question being asked is, "Within this particular category, what do you want?" The WMF is asking this specific question because the WMF is willing to provide new resources to do this specific type of (fairly well-understood and moderately scoped) work. It is not asking for "anything goes!" because it is not necessarily willing to provide resources for any idea that anyone might have. For example, the best product idea might involve more resources than are available. And even if they asked, "Anything at all! What do you want?", there would still need to be limits, because we would probably get submissions like "Ban this editor", "De-sysop everyone at this wiki", and "Pay me to write articles" (all of which have been seriously suggested in previous discussions). There will always be an "artificial criterion" at some level.
- Off-topic ideas are being e-mailed to relevant staff, so it's not like the idea is lost completely. It's just not relelvant to the current question. Whatamidoing (WMF) (talk) 19:48, 23 December 2014 (UTC)
Results don't seem to be correct
[edit]The intermediate results of the survey seem quite unreliable, judging from the number of "Completed contests".
- Citation bot - 234 - 4/12
- Reflinks - 1406 - 3/12
- CopyVios - 1388 - 3/12
- Cite4Wiki - 236 - 4/12
- revisionjumper - 1286 - 3/12
- Page stats - 1385 - 3/12
- Checklinks - 217 - 11/12
- Twinkle - 1353 -3/12
- Syntax highlighter - 380 - 5/12
- Dab Solver - 229 - 11/12
- Citation expander gadget - 235 -11/12
- HotCat - 1297 - 3/12
- Vada - 234 - 6/12
- Edit counter - 1383 -3/12
- AWB - 191 -11/12
- wikiblame/xtools' blame - 29 - 16/12
- PrepBio - 1302 -3/12
- FlickrFree - 1296 - 3/12
- NavPopups - 1259 - 3/12
- QuickIntersection - 1293 - 3/12
- URL decoder - 1330 - 3/12
- Peer reviewer program - 224 - 7/12
- Tab2Wiki - 1270 - 3/12
- ProofreadPage - 201 - 4/12
- Defaultsummaries - 1349 - 3/12
- WikiEditor - 346 - 6/12
- Rangeblock calculator - 192 - 4/12
- ImageAnnotator - 1315 - 3/12
- CrossCats - 1269 - 3/12
- GLAMourus - 1285 - 3/12
- derivativeFX - 1211 - 3/12
- Copy to Commons via - 240 -5/12
- JSL - 1234 - 3/12
- metadata - 1254 - 3/12
- dictionaryLookupHover - 1255 - 3/12
- RegexMenuFramework - 1200 - 3/12
- GregU/dashes.js - 227 - 4/12
- Please see the bot here - 241 -5/12
- Rechtschreibpruefung - 1258 - 3/12
- Match and split tool - 190 - 4/12
This means that all ideas added 3 December have had 1200 to 1406 answers, while all ideas added 4 December or later have 380 or less answers. Now, on December 3, there have been 525 votes, and on December 4 4735, or a total of 5260 votes. Since then, there have been (16862 - 5260) 11602 further votes.
The group of ideas with 1200+ votes has 23 issues, which means that at that early time (3-4 December), these would appear about once every 11 questions. To get 1000 votes, there should have been 11000 questions before the new ones were added, not 5000. (The additional 200+ votes coming after the expansion of the number of issues of course).
So either the date of addition of the ideas is wrong, or the graph of votes over time is wrong, or the total number of completed contests per issue is wrong. Doesn't give much confidence that the query is reliable of course either way. Fram (talk) 13:52, 17 December 2014 (UTC)
- It looks like you have failed to account for the 'I can't decide' option. There are nine different ways to respond to each pair, but only two of those methods are recorded on thst list. Whatamidoing (WMF) (talk) 02:05, 18 December 2014 (UTC)
- ??? In what way does that change anything in the above analysis? It only makes the error more egregious, not less. Please give a bit more explanation instead of a rather dismissive non-answer, and describe a scenario in which the figures given above make sense. Your reply doeesn't help me one bit to understand this. Fram (talk) 10:39, 18 December 2014 (UTC)
- Fram, is your complaint basically "There are slightly more than twice as many 'completed contests' as there are 'votes'"? Whatamidoing (WMF) (talk) 19:23, 18 December 2014 (UTC)
- At the end of 4 December, there were (if the dates issues were added, and the numbers of votes cast are correct) some 23,000 completed contests for the 23 initial issues, or some 11,500 completed "battles". But only some 5,200 votes were cast at that time. I can't see in your answer whether you really understand the problem correctly, or whether you think I have not taken the "one vote gets counted twice as a completed contest" issue into account. Rest assured, I have. Perhaps it's a coincidence that the results are nearly exactly double what could be expected (4 times the number of votes, instead of twice), perhaps it's indicative of the problem, that's for people with access to the data to decide. But from what I can see, the numbers are wrong, and without a good explanation or correction you might just as well throw the whole survey away (because, if the numbers are so wrong, then why would we trust the percentages?) Fram (talk) 07:28, 19 December 2014 (UTC)
- The percentages that are shown online are only estimates, so they're definitely "wrong" at some level. The real ones require a lot of computational effort, and will be available soon (maybe next week, although I don't know when Rachel will have time to get them written up and posted).
- By the way, I believe that the "votes" number is "validated votes", not "votes cast". They watch for certain patterns that are associated with trying to game the results, and invalidate them. Whatamidoing (WMF) (talk) 18:31, 19 December 2014 (UTC)
- I love it that you show that they are only estimates and so on, and four minutes later a colleague comes along and tells us that these are the results, and which address a number of comments but not this one. Great... I hope further reactions about this will be forthcoming, or I'll have to describe this survey as utterly unreliable at every junction where the WMF would try to use it to justify any choice they will make. Just speaking from experience here. Fram (talk) 07:54, 22 December 2014 (UTC)
- At the end of 4 December, there were (if the dates issues were added, and the numbers of votes cast are correct) some 23,000 completed contests for the 23 initial issues, or some 11,500 completed "battles". But only some 5,200 votes were cast at that time. I can't see in your answer whether you really understand the problem correctly, or whether you think I have not taken the "one vote gets counted twice as a completed contest" issue into account. Rest assured, I have. Perhaps it's a coincidence that the results are nearly exactly double what could be expected (4 times the number of votes, instead of twice), perhaps it's indicative of the problem, that's for people with access to the data to decide. But from what I can see, the numbers are wrong, and without a good explanation or correction you might just as well throw the whole survey away (because, if the numbers are so wrong, then why would we trust the percentages?) Fram (talk) 07:28, 19 December 2014 (UTC)
- Fram, is your complaint basically "There are slightly more than twice as many 'completed contests' as there are 'votes'"? Whatamidoing (WMF) (talk) 19:23, 18 December 2014 (UTC)
- Hi @Fram:, I'm checking your concerns with the All Our Ideas team and hope to have some feedback for you soon. I'm not certain about their workload before the holiday, so it may take a little time. Again, their working paper (also linked by Whatamidoing (WMF)) might be a bit TL;DR, but section 3 explaining Pairwise collection and analysis may be of interest. Also, if you happen to know of a system that automatically includes and shows stats and data in real time as an alternative, I'm open to hearing about it. --Rdicerb (WMF) (talk) 03:02, 23 December 2014 (UTC)
- Comparing the paper to the actual survey, I didn't have the feeling that it was in any way "adaptive". Even the most basic thing, like no longer showing me options after I have repeatedly said that I don't know anything about them, or after I have voted them both down ("don't like either option" or however it was stated), is not adaptive, it is off-putting. "I told you that I don't like this option, why do you keep presenting it?" Anyway, nothing in the paper explains the serious anomalies in the figures I highlighted. I don't need or want real-time stats and data, that is not my concern (would be nice, but is not relevant for the current issue). The problems I present are about data that was nearly two weeks old and pretty static. Fram (talk) 07:51, 23 December 2014 (UTC)
- I also noticed the issue with showing you two options that you chose "don't like either" - I'll add that to the feedback - while there may be a reasonable explanation, I couldn't begin to wonder what it is. As for the static information, they should be able to see archived versions of results. I'll keep digging in. --Rdicerb (WMF) (talk) 18:53, 23 December 2014 (UTC)
- I understand why, if you say that you don't like A or B, they keep asking you whether you like A better or worse than C, D, E, etc. (If they don't ask, then they can't determine whether you also dislike C, D, E, etc., just as much as you dislike A.) Their goal is to put everything in an order of preference, from most favorite to least favorite, not to record "yes, this one" or "no, not that one".
- Once you've said that you don't know anything about A, I can see two possible values in the question: If "C" is your most favorite option, then you don't really need to know much, or even anything, about A to be able to aswer that you like C better than anything else. Also, one of the main advantages of pairwise comparison is that it reduces the opportunity for "gaming" the votes. You can't upvote your favorite by systematically downvoting everything else (or not voting on other options, depending on the system). Not only that, on a large survey (they've run a couple with more than 200 items for comparison), you can't even guarantee that you'll get a chance to vote on the one that you're promoting, unless you're willing to answer dozens or hundreds of questions first. AOI has additional safeguards against gaming the vote beyond the inherent advantage of pairwise comparison, and I believe that continuing to ask about options that you claim not to understand may be part of their anti-abuse strategies. The alternative could be quite problematic, especially in small surveys. If you could systematically remove most options from your survey (e.g., by declaring that you don't understand them), then the result is less-accurate rankings and an increased likelihood of being able to vote repeatedly on the one that you claim to understand (i.e., the one that you are trying to upvote). Whatamidoing (WMF) (talk) 19:33, 23 December 2014 (UTC)
- Thank you for putting is so succinctly, Whatamidoing (WMF) - I definitely see that value. :) I'm wondering about where Fram says he has gotten the same pairing more than once after initially choosing the "don't like either" option - it's possible it's an anti-gaming mechanism (if noted there, I would need to re-read the paper in depth). --Rdicerb (WMF) (talk) 21:45, 23 December 2014 (UTC)
- Choices are delivered randomly, which means that it's possible to get the same question twice, and if you answer enough times, it's even likely. Whatamidoing (WMF) (talk) 18:11, 24 December 2014 (UTC)
- Thank you for putting is so succinctly, Whatamidoing (WMF) - I definitely see that value. :) I'm wondering about where Fram says he has gotten the same pairing more than once after initially choosing the "don't like either" option - it's possible it's an anti-gaming mechanism (if noted there, I would need to re-read the paper in depth). --Rdicerb (WMF) (talk) 21:45, 23 December 2014 (UTC)
- I also noticed the issue with showing you two options that you chose "don't like either" - I'll add that to the feedback - while there may be a reasonable explanation, I couldn't begin to wonder what it is. As for the static information, they should be able to see archived versions of results. I'll keep digging in. --Rdicerb (WMF) (talk) 18:53, 23 December 2014 (UTC)
- Comparing the paper to the actual survey, I didn't have the feeling that it was in any way "adaptive". Even the most basic thing, like no longer showing me options after I have repeatedly said that I don't know anything about them, or after I have voted them both down ("don't like either option" or however it was stated), is not adaptive, it is off-putting. "I told you that I don't like this option, why do you keep presenting it?" Anyway, nothing in the paper explains the serious anomalies in the figures I highlighted. I don't need or want real-time stats and data, that is not my concern (would be nice, but is not relevant for the current issue). The problems I present are about data that was nearly two weeks old and pretty static. Fram (talk) 07:51, 23 December 2014 (UTC)
- ??? In what way does that change anything in the above analysis? It only makes the error more egregious, not less. Please give a bit more explanation instead of a rather dismissive non-answer, and describe a scenario in which the figures given above make sense. Your reply doeesn't help me one bit to understand this. Fram (talk) 10:39, 18 December 2014 (UTC)
I haven't looked at the source code, but the obvious guess is that contests can be derived from the transitivity of rankings; that is, ranking A over B and then ranking B over C means 6 contests.
Still, the sharp clustering of the number of contests (all ideas before Dec. 3 seem to have around 1200-1400 contests, and all ideas afterwards 300-400 contests) is somewhat surprising. --Tgr (talk) 21:46, 2 January 2015 (UTC)
Checking on results (both survey and system)
[edit]Hello all, the survey for en.wp is now closed, and you can review and check out metrics and results here. The team may not be able to answer all questions about this, but can reach out the the All Our Ideas team if necessary. I've otherwise turned off the ability to respond to the question we asked as the two-week pilot period for the English question is up. The es.wp question is still active here.
I'm creating a deck for the results for both en.wp and es.wp (which has another few days of polling, until the 22nd) which will be publicly accessible. I will post here and around when I've started it.
I know that some users didn't like the pair-ranking method of gathering data and there are questions concerning the accuracy of the results. Anyone is welcome to review the credentials of those who created the survey at this link. Noyster, I cringed a bit when I saw that you wrote "If possible please consult specialists in survey methodology before proceeding." - I had! :) I'm personally not a sociologist or a game theorist, but I trust the intelligence and integrity of those who created this system to be supporting . I've spoken with a few users who know and understand survey systems, and am happy to consider another system that fits the above criteria. I might put a call out to other areas of the communities in order to find a comparative system. If you know of such a system, I welcome a message either here or on my Meta Talk page.
There are several issues that are deterrent to actually using the system to begin with, such as lack of hyperlinking ability and some strict character limitations. It was really difficult to be clear with what we were asking in 100 characters for the main question, for example. I'm also making a comprehensive list to send to the All Our Ideas folks to see if we might be able to move forward with this. --Rdicerb (WMF) (talk) 18:35, 19 December 2014 (UTC)
- It's encouraging that our comments are being noted. You've had plenty of apposite comments about the pilot survey from NeilN, Fram, UTechnical and others as well as me. I'd much prefer some form of ranking exercise where all the options are there on the page, but whether we move to this or stick with paired comparisons, the key to making this survey meaningful is maximum attention to compiling the list of options before the survey begins.
- The options should be limited in number (twenty at most), fixed throughout the survey period, and all descriptions should be comparable in terms of:
- Length
- Comprehensibility to the general WP editor
- Accessibility (i.e. all with hyperlinks or none with hyperlinks)
- Specificity
- And like I said before, all respondents should be presented with a clear succinct explanation before launching into the survey.
- I look forward to seeing proposals for the next phaseNoyster (talk) 12:10, 20 December 2014 (UTC)
- @Rdicerb (WMF):, you start your post with "you can review and check out metrics". I did, in the section right above it. Any comments on that? If the results are not correct, then any tweaks of the intro, questions, ... won't help you of course. Fram (talk) 07:56, 22 December 2014 (UTC)
Future plans
[edit]"After this, we'll review to see if this is a helpful and accurate way of gathering ideas." - Anyone not from the WMF think this was remotely the best way of gathering ideas? --NeilN (talk) 05:44, 17 January 2015 (UTC)
- Did this review ever happen? Please point to the results of the review. Rogol Domedonfors (talk) 11:49, 21 June 2015 (UTC)
So...
[edit]@Rdicerb (WMF) and Whatamidoing (WMF): It's been over three months since the survey closed. Is the WMF planning to do anything with the results? --NeilN (talk) 23:46, 28 March 2015 (UTC)
- Hi @NeilN: - thank you so much for the nudge, and please accept my apologies for not following up on this yet. I would like make advancements with the results, but timing and staffing to work on these initiatives is being worked out at the moment. What the Community Liaison team would like is to have a conversation around how, specifically, the tools that were highest ranked in the survey could be improved - ideas were ranked, but the specifics have not described. I'm hoping that this conversation can start in the next couple of months, so we can then make a request to Product/Engineering for resources to support these tool improvements. Beyond that, it would be lovely to have a broader survey with more communities to get users more influence over the product roadmap. It's a bit hard for me to be specific with timeframes at the moment as there are a lot of moving pieces (for example, the Community Liaisons are no longer in the Product Department, we've joined with most of the other community-based teams into the new Community Engagement dept, and that transition is taking a significant amount of my time), but that's what we're looking at.
- I showed the survey results to Analytics, and they were well received from an that perspective. It was easy for Leila in that team to see that several people were trying to "game" the system with multiple survey attempts (unsuccessfully). Happy to show those results if it would be helpful, but at the end of the day, the actual goal is to move towards improvements of these tools.
- So, long story short: Need CL support to have those conversation about how the tools in the results need as far as improvement, then gathering engineering support to make it happen, then more ongoing support for more product initiatives that the communities are requesting. No specifics on timeline yet. Let me know if that's helpful, and if not, what information *would* be helpful. Cheers :) --Rdicerb (WMF) (talk) 07:25, 30 March 2015 (UTC)
- @Rdicerb (WMF): Thanks for the update. I'll be checking in again in a couple months. --NeilN (talk) 17:45, 30 March 2015 (UTC)
- I wanted to follow up, NeilN, so I'm basically copying what I just wrote over on Lila's talk page just now: The good news is that with the Call to Action, the announcement of the Community Tech Team, and pending additions to Community Liaisons team specifically to support conversations around this, things are moving into place to act on the results. We're almost into May, and it's my expectation that in less than 2 months you'll see some more conversation around here. Do let me know if there are questions or concerns. Cheers, Rdicerb (WMF) (talk) 05:13, 22 April 2015 (UTC)
- "less than 2 months" from the date of this post would indicate mid-June, say some time in the next fortnight. @Rdicerb (WMF): are you still on track to revive this discussion before then, or give us your assessment of the results and your decisions, or where you will be doing things as a result, or whether there will be another survey? Just a reminder. Rogol Domedonfors (talk) 15:29, 6 June 2015 (UTC)
- I wanted to follow up, NeilN, so I'm basically copying what I just wrote over on Lila's talk page just now: The good news is that with the Call to Action, the announcement of the Community Tech Team, and pending additions to Community Liaisons team specifically to support conversations around this, things are moving into place to act on the results. We're almost into May, and it's my expectation that in less than 2 months you'll see some more conversation around here. Do let me know if there are questions or concerns. Cheers, Rdicerb (WMF) (talk) 05:13, 22 April 2015 (UTC)
- @Rdicerb (WMF): Thanks for the update. I'll be checking in again in a couple months. --NeilN (talk) 17:45, 30 March 2015 (UTC)
- @Rdicerb (WMF) and Whatamidoing (WMF): it is now two months since the latest of the various times by which you expected to resume discussions here. While that was not a promise, nonetheless you have raised expectations and you need to acknowledge that. Please tell us what your current plans are for telling the community what will result from the investment of volunteer time and donor money in this project. If nothing useful is going to result, then just say so now, and consider the extent to which you need to apologise to the community for the waste on time and money. Otherwise please make a clear and specific commitment to the community about what the next steps will be and when. Let me say now that vague statements about expecting something to happen at some future time or waiting for someone else to do something else first are not going to be sufficient. You have had six months to finish up this work. Anything other than one of the two alternatives I mention is going to be read by the community -- whether you like it or not -- as a clear statement that you do not know, or do not care, what you are doing here. In other words, either demonstrate success or admit failure. No more excuses, no more delays, no more vagueness, no more expectations: act, and act swiftly. Rogol Domedonfors (talk) 12:01, 21 June 2015 (UTC)
- Thank you for providing the update today. Rogol Domedonfors (talk) 17:24, 13 August 2015 (UTC)
An update on one of the requested tools (RevisionJumper)
[edit]An update from the EU Hackathon's presentations from Lyon yesterday: DerHexer updated his RevisionJumper tool this weekend, which was one of the top tool requests for enwiki during the product survey pilot a few months ago. As there hasn't been an opportunity to have conversations about what, specifically, community members wish to see improved within the tool, I don't know how this changes the prioritization within that particular list, but I wanted to note it. He would be the best person to describe which bug he's fixed, and I don't know that it changes any community wishes, but I wanted to note it here. Thank you! --Rdicerb (WMF) (talk) 03:57, 27 May 2015 (UTC)