Grants talk:IdeaLab/Multidimensional EPR (Earned Public Reputation)

From Meta, a Wikimedia project coordination wiki

How to earn positive reputation on Wikipedia?[edit]

Some of the concerns about this idea involve recognition of positive behaviors, so in this topic I'm attempting to explain the symmetric relationship between contributions and the reputations of the contributor. However since this is a symmetric relationship, it is important to know how articles are evaluated, and I don't know enough about that. In some areas I may be suggesting changes to how articles could be evaluated, but I'm writing in ignorance.

In a simple case, consider impartiality. If an article gets a lot of corrections or reversions for NPV violations, then the article itself should be regarded as suspicious simply because it is dealing with a controversial subject. An article that gets fewer such reactions should be regarded as higher on the impartiality metric, and the people who have contributed to that highly impartial article could receive higher impartiality scores in proportion to their contribution to the article. Going farther, if someone makes many contributions to a low impartiality article, but none of that contributor's contributions trigger NPV concerns, then that person could receive even more recognition for being impartial.

Accuracy is a trickier one to assess, but if someone provides links to highly reputable sources, then that is grounds to raise that person's reliability or accuracy metric. I can think of various other examples such as responsiveness or stability, but I'm not really that familiar to what matters to Wikipedia from inside. The general principle should be to recognize the positive contributions in ways that are relatively automatic and not intrusive. In contrast, negative evaluations are likely to be more obvious because they are related to perceived problems. Shanen (talk) 04:12, 29 July 2018 (UTC)[reply]

More thoughts on this topic in response to another IdeaLab submission about rating articles. Unfortunately that idea was rather tentative, but it still provoked a chain of reasoning that can be applied here. I added a long suggestion along the same lines (which are about to follow), but now I can't find that idea again to link it here. I don't know if he retracted it for rewriting or I just can't figure out how to find it. Rather than try to remember and rehash the full justification from that reaction, here I'm just going to focus on a possible algorithm by which the articles can be evaluated on the positive side and how that data can be used to feed back into the reputation of the contributors. The focus on positive reputation is because problems that cause negative evaluations to flow to the contributors are easier to see, so it's the positive that needs more help.
To make this clear, I think it will be better to relabel things a bit. I think MEPR is better to contrast with the single-dimensional metrics (including the idea I just referred to about a 5-star rating of articles). Further, I think the two ratings need to be explicitly referenced, so I'm going to say MEPR-A for the reputation of articles, while MEPR-C will be for the reputation of contributors. Remember these should be mostly symmetric concepts with similar dimensions. (However that is semantic similarity, and it might be appropriate to use slightly different labels for the same thing as seen from the two perspectives. This is actually similar to some of the (below) cases where it may be better to use a negative attribute like "biased" instead of the negated attribute "not neutral".)
I suggest that all the readers be given an easy and convenient option to rate the articles along the same dimensions as the EPR (AKA MEPR). The readers who chose to do this would be rewarded with some extra meta-data about the article after they answer the questions (if they are interested to look at it). This is just one suggestion about how to do it:
Step 1: Ask readers for feedback. There would be a notice at the top of the article encouraging readers to rate the article at the bottom.
Step 2: At the bottom would be a little box asking the first of two or three questions. The first question would offer several dimensions by which to evaluate the article, asking the readers to pick the dimension that is most important or relevant for this article. (Not too many at one time, even if there are a lot of dimensions.) After enough responses have been collected, it will be clear which dimension is the most important, and then you stop asking about that dimension, take it off the list of candidates, and start looking for the #2 dimension. For articles with lots of readers, you can go down to lower and lower dimensions. For articles with few readers, you may deliberately avoid the popular dimensions to collect data on the lower dimensions. (Maybe even skip the first question?)
Step 3: The second question would ask for a rating on the dimension they picked, either positive or negative. For validation purposes, I think it should actually ask backwards some of the time. Using the example mentioned above, instead of asking the reader to rate how neutral the article is, the validation version of the question would ask the reader how biased the article seems. If most of the readers agree the article is high on the neutral question, then most of the readers who get the opposite question should agree that it is low on the biased question.
Step 4: The third question is really about one dimension, but it is so important that perhaps it should always be asked about. Tentative label is "Satisfaction", and it might be "Did this article provide the information you sought?" (Most of these ratings should be on Likert scales, which usually have 3 to 5 levels.) This dimension is also different in that the other dimensions should be biased positively under these conditions, but this dimension may tend to be distributed around zero. (My theoretical explanation is a bit complicated, but it has to do with the length of each article.)
Step 5: The reader who answered the question would get some data about the MEPR-A of the article as rated by other people. I would suggest just the dimension that reader had picked, but with an option to view the entire MEPR-A. (There should probably be a restriction that readers can only rate each article once or once over some time period.)
Step 6: Having measured the EPR-A, each dimension is now reflected back to the MEPR-C of the people who contributed to the article. Primary contributors (perhaps at least 20% of the writing or 30% of the edits?) would get a full weight of the MEPR-A score for each dimension. Lesser contributors would get fractional scores.
Yes, I admit that this may seem a bit complicated, but I think it could address and even capture the fundamental asymmetry between positive and negative evaluations. Shanen (talk) 21:40, 29 July 2018 (UTC)[reply]
Two more thoughts about this. Since this input for positive MEPR-C is indirect and fundamentally different from the problem-based input for negative MEPR-C, that justifies weighting them differently. I actually think the data should be viewed using logarithmic scales, but the numbers under the data can be given linear weights. In one example, someone might write a large part of an article that earns a high MEPR-A score on the accuracy dimension, and that contributor deserves a positive reputation for that work, which should not be totally negated by one mistake, especially if the mistake was quickly recognized and corrected.
Also, the positive reputation of an article should be a long-lasting attribute that doesn't age as quickly as the data about a mistake. Actually, perhaps the positive MEPR-A of an article should remain high as long as readers continue to confirm that rating, though the credit reflected to the contributors may change as newer contributors supplant some of the older ones. Shanen (talk) 21:58, 29 July 2018 (UTC)[reply]

Will EPR become a kind of popularity contest?[edit]

This is an attempt to respond to the concerns of Yetisyny and Ouranista. I feel like their concerns may reflect a lack of clarity in my presentation. The evaluations that generate EPR should not be related to the contributor, but only be based on the contributions themselves. In positive cases, it will actually be better if the goodness is detected in automatic ways, even if the benefit to EPR is fuzzy and slightly questionable. The system should be generous in sharing credit for good work.

I think there is actually more of a risk on the other side, with negative reputation associated with problems. I think problems are more likely to trigger attempts to find out "who done it", and possibly even to look at other contributions from that person to aggressively attack the EPR. In the worst case, someone might deliberately go after people who are perceived as having a "bad" position on some issue, which would be a massive violation of NPV. I think the privacy of contributors needs to be protected even if you are complaining about a serious factual error. The report should be focused on the mistake, not who did it, and the damage to EPR should not be visible to the person who discovered the mistake.

However, the person who lost "accuracy" reputation for the mistake should certainly be able to see what happened. Not for the sake of fighting over the facts, though there are cases where that might be appropriate, but for the sake of understanding how it happened and how not to do it again. Shanen (talk) 04:23, 29 July 2018 (UTC)[reply]

  • Discussion My bg with wiki is five years experience in wikias and wikipedia editing, currently serving as bureaucrat and admin to several wikias and contributor to a few wikipedia topics so I've seen both good and not-so-good editing and lots of interpersonal dynamics. My objection could be based on misunderstanding, but with all due respect, your explanation didn't clarify the idea for me. I'm assuming it's the editor who is getting this EPR score or is it the article itself? And if the editor is getting the score, it totally relates to them. As an education professional, I've seen so-called impartial metrics used properly and improperly, even when the design is good; and as a user, such a score wouldn't mean much to me. I don't go to Wikipedia for die-hard encyclopedic information. I go to Wikipedia for user-contributed information. I understand it may or may not be the most accurate. If Wikipedia wants to become an encyclopedia, it should hire experts to write its articles. Otherwise, it should continue to accept edits from people from all walks of life with expertise at all levels and allow other editors to police the articles for spam and content accuracy, as usual. Finally, if the goal is to evaluate the quality of edits, that sort of evaluation is going to intimidate many individuals from becoming editors and isn't getting more people to edit the goal here? Inclusiveness? I thought so. At any rate, I appreciate the effort you put into creating the idea and vouching for it. I just don't think it holds merit for Wikipedia. Ouranista (talk) 03:20, 30 July 2018 (UTC)[reply]

Discussion from endorsements section[edit]

Hi folks, I've moved opposition and discussion (with context) from the endorsements section. By convention in IdeaLab, opposition is not listed there because it typically results in things like extended discussion, clarification, and changes to the idea. Ideas are not Request for Comments a la English Wikipedia, because they are works in progress as opposed to finalized proposals. Please contact me on my talk page if you have further questions. I JethroBT (WMF) (talk) 22:47, 29 July 2018 (UTC)[reply]

  • Oppose Oppose No, no, no. Goes against wikipedia's principle of good faith and the qualification "not having earned public reputation" blames the person's good name. --Havang(nl) (talk) 12:33, 28 July 2018 (UTC)[reply]
  • Oppose Oppose I understand the objective purpose, however such a "moniker" will inevitably only reflect an editor's popularity with users, not their editing skills about which non-editors know little to nothing being privy to only the end product and not the effort it takes to be an editor. Even if only editors could vote on other editors, it would still become a cliquish game. That's social science at work. Ouranista (talk) 14:50, 28 July 2018 (UTC)[reply]
    • What I am REALLY trying to do is understand the two opposing positions from Havang(nl) and Ouranista, but I don't know of a better mechanism to ask them. In both cases, it sounds to me as though their objections are based on my failure to explain the idea clearly.
If I actually do understand the concern of Havang(nl), then he is actually arguing that "not having earned public reputation" is a separate status from anonymous contributions from a contributor who is not logged in. In that case, I think we are discussing the value of contributions that are credited to specific users versus anonymous contributions. If he is just saying that the zero point should be calibrated differently, then I think that's quite reasonable, but I can't understand the basis of his "No, no, no" vehemence.
If I actually understand the concern of Ouranista, then I think that is a more serious concern, but I'm not sure how it would apply in the Wikipedia context. In most cases you would be reacting to the published contribution without seeing who the source was. For example (again picking the easy case of accuracy), if you are pointing out a mistake, then you should be required to provide evidence about the accurate facts, and perhaps you should have to look at the citations or evidence provided for the mistake, but there doesn't seem to be any reason why Wikipedia would have to tell you who did it--though your correction would then reflect negatively upon the "accuracy" dimension of that contributor. In the case of a more debatable dimension, such as impartiality, you can still report a violation of NPV without knowing who done it (and your report might deserve discounting unless you have positive impartiality in your own EPR).
By the way, I also tried to find other mechanisms to ask them more directly, even going so far as to click on "Help", but I didn't get anywhere... Though I am not a newbie to Wikipedia in chronological terms, and even though I am something of a tool phreak, the Wikipedia tools mostly evade my comprehension. Shanen (talk) 22:23, 28 July 2018 (UTC)[reply]
  • Support Support It seems like a good idea but it also needs more work. I think it would do more good than harm and help improve the quality of the articles as well as contributions to Wikipedia, encouraging more good-faith edits and discouraging vandalism or other bad behavior.
However, there are various details that are left out and possible compliance issues with Wikipedia policies, as those who do not support this idea have noted. So I think that this idea basically needs more people to contribute to the idea and help refine and improve it and solve the problems with it. But overall it is a good idea and we should try and build on it and improve it instead of just discarding it without even seriously considering it, as some here are advocating.
Yes there are some serious flaws with this idea that others have pointed out, but I think this idea could be improved from its present form to fix them. I have some experience over at Wikia.com where on many wikis people can earn various “badges” for different amounts and types of contributions and it sort of becomes like a game to try and improve your stats. But if the system is carefully designed, the incentive structure could reward good behavior and punish bad behavior in such a way that people would improve articles more and vandalize articles less. If it is badly designed this would just devolve into a popularity contest ruled by cliquish behavior and then either lead to groupthink or edit wars. But I have enough faith in the Wikipedia community to believe that Wikipedians would implement this as carefully and in as well-thought-out a way as possible, if an idea like this were implemented.
Anyway if the EPR thing were implemented in such a way that people trying to maximize their EPR score would, ipso facto, have to make a large number of positive contributions to Wikipedia and almost no negative ones in the process, this would certainly encourage good behavior and discourage bad behavior. And some people would do this simply to see how high of a score they could get, and would not need any more of a reward than that. For people with a low score, though, we would need to analyze the reasons why their scores are low and whether they actually did anything wrong or if the EPR algorithm is at fault. And assuming good faith, we would give them the benefit of the doubt unless there is evidence to the contrary. So I guess my support is not for this idea in its present form but I think this idea holds some promise if it improved upon and should not simply be discarded because of the flaws it presently has. Yetisyny (talk) 23:25, 28 July 2018 (UTC)[reply]
All these arguments prove me it's no, no, no for me. Content has to be judged, not contributors. --Havang(nl) (talk) 07:37, 29 July 2018 (UTC)[reply]
Thank you, Havang(nl), for clarifying how to annotate here, though I think you mostly haven't actually understood the idea very well. I have added a topic in the "Discussion" section that may make it more clear, but so far you have not managed to clarify the basis of your emotional reaction. Perhaps you could respond more clearly or persuasively in that topic? Right now I feel I am supposed to answer your question, but if my previous response was not responsive to your concerns, then I evidently can't figure out what is bothering you. Shanen (talk) 07:46, 29 July 2018 (UTC)[reply]
Wikipédia is not about persons, it's about articles. More knowledge about personal reputation may lead to less healthy wikipédia dissussions. There is already the autopatrolled user qualification, that's more neutral than "Earned Public Reputation". Prejudgments exist. I personally have been attacked on WP:Fr by "not being french". I decided therefore to add (nl) to my users name. Again, let's just stick to the WP principle of good faith.--Havang(nl) (talk) 08:01, 29 July 2018 (UTC)[reply]
You, Havang(nl), have yet to say anything that is actually relevant to my idea, though I have asked you for clarification several times. At this point I am mostly curious about why you are apparently following the idea so closely. Perhaps there is some reason you feel personally threatened by the idea of EPR, even if you don't understand it? Again, I ask you to look at the clarification in Discussion, where both of the topics address your apparent misconception, but from different perspectives on the most plausible misunderstandings. On the other hand, if you have nothing constructive to say, then perhaps you should say nothing? Shanen (talk) 09:22, 29 July 2018 (UTC)[reply]
Content should be judged without considering who uploaded it. I feel this EPR is against democratic wikipedia. --Havang(nl) (talk) 09:26, 29 July 2018 (UTC)[reply]
That is NOT what EPR is about, no matter how many times you, Havang(nl), repeat the false statement. It is apparently based upon your initial impression or upon something you prefer to believe, or perhaps your own fear of accountability for your public behaviors. Or whatever. However, at this point I certainly feel that your EPR should indicate something about your lack of intellectual integrity, and in that case I might have seen by a glance at your EPR that your original comment was probably not sincere and I would not have wasted the time trying to find out what you were thinking. Now that I understand what sort of person you are, I cannot imagine trusting you enough to respond to any further comment you might make, but EPR would have nipped this entire discussion in the bud. If I didn't ignore you, then you would have ignored me. But thanks anyway for making my point so well. (No, I'm NOT saying your vote shouldn't count, but I am saying that you have completely failed to convince me that I should consider changing my vote to agree with yours, and in the process you have convinced me that you will NOT even honestly answer my questions. You have only changed my mind about attempting to communicate with you.) Shanen (talk) 17:05, 29 July 2018 (UTC)[reply]
Do you denie that EPR is a judgement about persons? Comment on content, not on the contributor is a basic Wikipedia rule.--Havang(nl) (talk) 19:19, 29 July 2018 (UTC)[reply]
Oppose Oppose "a lot of corrections or reversions for NPV violations" has nothing to do with a "controversial article" has nothing to do with a "neutral article". A neutral Article can also be on a controversial subject, making it a controversial article at the same time. "A lot of corrections or reversions for NPV violations" also happens on neutral articles, in their time of popularity, like presidential candidates' articles during elections. --Ne0Freedom (talk) 12:59, 31 July 2018 (UTC)[reply]

What is stopping this from devolving into something like what has been created in the People's Republic of China. Who decides what is positive and what is negative? Can this have a negative impact on editors who contribute in more controversial areas?--RightCowLeftCoast (talk) 01:28, 1 August 2018 (UTC)[reply]

You, RightCowLeftCoast, are raising an excellent point, though I am not sure if this is the best place to tackle the issue. Perhaps it should be a new topic or part of the anonymity topic I introduced below. (You can start a new topic by clicking on the "Add topic" tool at the top right of this Discussion tab.)
I think the best short answer I can formulate right now is that it is mostly a question of community consensus about who is supporting the principles of Wikipedia, such as NPV and NOR. In general I think anyone should be free to contribute, but it does matter what sort of person the contributions are coming from. This remains a controversial topic within Wikipedia. Do you remember the discussions (perhaps two years ago?) about disclosure of financial involvement? Could someone help out with a link to the revised disclosure policies? Shanen (talk) 23:02, 6 August 2018 (UTC)[reply]
It is possible for different groups of editors to be editing from opposite sides of a topic, and thus reversing each other, yet both be working towards neutrality, and thus both receive a negative score in this propsal.
Perhaps, the better question, is not how many edits an editor has, but how many edits provide significant content (say 500 bits plus). If anyone would like at an editors edit count, that data is there, but hidden in a myriad of graphs. At that edit count screen one can also see where the editor has most recently been active, had the most activity, and what type of pages they edit (talk, wikipedia talk, article, etc.).
This may just create another system that is gamed, to get a better reputation, or to destroy another editors reputation. Neither are favorable to the various projects IMHO.--RightCowLeftCoast (talk) 23:49, 6 August 2018 (UTC)[reply]

Where are the boundaries of public behaviors?[edit]

This is a serious problem that has me questioning the feasibility of the proposal... It started by considering all of the readers as potential contributors to Wikipedia. That was essentially a unification resulting from the notion of trying to use reader reactions (in any form) to determine the MEPR-A (as the reflection of the MEPR-C). That implies that the reactions (or ratings) of the readers could be considered as becoming part of the public record, down to the level of which sections of the articles got the clicks... Doesn't that reek of an intrusive and even dangerous level of surveillance?

Opt-in might be part of one solution approach. In that approach, the identity of the person who performed the public behaviors would be masked by default. Or perhaps it could even be a conditional masking? My MEPR-C would be visible only as an anonymous tag (i.e. "User 11,237,398") unless my MEPR-C was high enough, in which case I'd allow it to be visible in association with my handle (or even my name)? Actually, I'm such a weirdo that I might prefer to leave my handle visible no matter what my MEPR-C is, while other people would leave it set to never reveal their handle, no matter how large their MEPR-C grows. ("That User 231,411,079 sure seems to be a great guy.")

I think the entry points for this topic must be in Wikipedia's policies towards personal accountability. I know that there have been many discussions of various forms of vandalism and how to protect articles, limit the damage of recognized vandals, and even prevent malicious edits. Therefore I should start by trying to find those discussions, and I'm heading off to do so. However, if you want to add pointers or your thoughts on this topic, I thank you and I'll check back here later on. Right now I regard it as a fresh can of worms, and the only solution I can see really bothers me. The kids might feel differently, they might see it as inevitable or even natural, but I don't like the idea of living in a complete surveillance society. Personally, I don't think Wikipedia should go with that flow. Shanen (talk) 20:20, 30 July 2018 (UTC)[reply]

Using the MEPR-A data like bread crumbs?[edit]

Not sure if this is as natural a solution approach as it seems... More likely I'm flying a giant kite of delusion of a grand solution, but... The problem being addressed here is getting more positive data about articles to offset and hopefully even overwhelm the negative data associated with specific problems (caused by recognized contributors who are hopefully learning from their mistakes). There should be a good reason for readers of articles to express their opinions, hopefully mostly positive. Perhaps allowing the use of each of their quick reactions as a kind of bread crumb for that article would motivate some readers?

From this bread-crumb perspective, the opinions should be regarded as sensitive personal information, which is also natural and proper, but the person who submitted the opinions would be able to review their opinions to see the related articles that they had reacted to. (It would be nice if there were also a personal annotation system, but that would be a much more complicated project, probably outside of the scope of Wikipedia, and also lacks internal justification, whereas the reactions to articles is intrinsically value to Wikipedia, even in the form of anonymized demographic totals.)

From the privacy perspective, the data could be used (in several ways) as part of the MEPR-C of contributors, but the default should be to protect privacy by anonymizing the identity of the person behind each MEPR-C. In other words, you could see that "User 27,342,123" had a certain MEPR-C if you had some reason, but there would be no visible link between that reputation and any person unless the person decided to share his MEPR-C in public. For most people it would be another kind of private information, but hopefully useful feedback on how the world (of Wikipedia people) perceives them.

Seems like I better recap and refine the MEPR-A part of this idea. My prior suggestion was that there be a note at the top of the article asking for feedback at the bottom of the article, and at the bottom of the article there would be a small interactive questionnaire asking two (or three) specific questions. Readers could easily hit the <End> key at any time they are ready to record their opinion, with the hope being that many of them will do so to record their reaction when they have finished reading. The accumulated (hopefully) positive recognition of the article would then be reflected into the MEPR-C of each contributor. I suggested that the credit (in the form of higher MEPR-C ratings in each dimension (as selected by readers)) would be apportioned based on relative contributions, with major contributors (or editors) getting the full amount and other contributors getting fractionally less. However I realized that could be refined for large articles. If the reader has clicked on any section of the ToC of the article, then the questionnaire result could be concentrated into a local MEPR-A and the credit could also be concentrated on the specific contributors who worked on that section (or sections). Shanen (talk) 21:12, 31 July 2018 (UTC)[reply]

Can the anonymous identity be used as a benchmark?[edit]

Consider the amalgamation of all anonymous contributions, both for the ratings of articles and for contributions. That would define a particular MEPR-C that, simply on the basis of the amount of data, would be rather stable. At the same time, the MEPR-C(Anonymous) could be compared with the MEPR-C values of various demographic groups of identified people or even subsets of itself based on such characteristics as country of IP address or people reading and contributing to certain categories of articles. Kind of a minor application, but it would be interesting to see the effect of that single variable of anonymity versus identified, even when the identity is generally not visible.

However, mostly it seems like no one else is interested in this idea and its ramifications since the initial "burst" of interest when the idea was first submitted. Shanen (talk) 01:29, 2 August 2018 (UTC)[reply]

Other criteria of reputation?[edit]

This topic is based on this Grants-tab contribution from Yoconst (talk): A reputation, usefulness or accuracy metric could allow better organization of the community. Tools are already implemented (thank contributors for specific contributions) and an automatic system of counting if someone has resolved controversies, added sources, made constructive criticism etc can be readily implemented. Yoconst (talk) 14:00, 7 August 2018 (UTC)[reply]

My initial reaction is that the metric of counting sources is relatively easy to track from the article, both for positive and for negative cases (if a source gets removed or replaced). In dimensional terms I think this dimension for MEPR-A might be called "substantiated" for articles with many sources versus "unsubstantiated" for articles that have few sources. As reflected in MEPR-C the label might be "substantial" versus "opinionated". Or maybe "proved" and "unproved" for both sides?
However I'm not sure how to measure the other two in an automatic fashion. I think that resolving controversies is important, but I'm not sure how to recognize it in the resulting article or who would then decide who got the credit. However the dimension might be "flexible" versus "rigid", both for the article and the person. Comments in Talk tabs can be measured, but I think it would be difficult to figure out which ones affected the article in a "constructive" way versus the less useful comments that might be described as "nonconstructive". Shanen (talk) 08:28, 11 August 2018 (UTC)[reply]
Reflecting more on this comment, I think it is about three primary dimensions: proven (versus unproven), accurate (versus inaccurate), and impartial (versus biased). As reflected in MEPR-A, the positive side of "proven" is an article with many sources and the negative is an article with too few, while for MEPR-C positive is for the contributors who provided the sources while negative might be reserved for people who actually provide bad references. For accurate, I think the articles are already measured on this and there should hopefully be almost no cases of negative MEPR-A on this dimension, but for contributors it would probably be easiest to share the positive kind of globally based on relative contribution, whereas the negative side is easily focused on specific contributors who made mistakes. The third dimension is more difficult, but so is NPV in general. Again the goal is for all articles to be positive here. Shanen (talk) 00:38, 12 August 2018 (UTC)[reply]