Article validation possible problems

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

The article validation feature is going live fairly soon on en:.

Many people who hear about the feature immediately come up with possible problems with the data and how bad-faith article raters may try to abuse the feature.

The plan at this stage is to gather raw data and not use it for anything except analysis afterwards to see what needs to go into place for the next cycle, so we need not do anything at all as yet. However, listing ideas here will help us look for possible problems.

List your problem as a ==section header== and people can discuss it underneath.

Vote spamming[edit]

The obvious one. Create accounts and make lots of high votes for your version. Add as many IPs as you can as well - David Gerard 23:39, 21 May 2005 (UTC)

  • Votes are to be public so far, much as edits are. This means sockpuppetry can be looked for, although IP checks are difficult going back more than a short time - David Gerard 23:39, 21 May 2005 (UTC)
  • Versions of articles are rated -- ie. individual edits are given a rating, which feeds a score for the editor. Scores for the editor in turn feed to the value given to their rating for each edit. Sockpuppets and vandals by nature will have low scores, while appreciated editors will have higher ones. (Problems: controversial - voters must agree ahead of time to a scoring system. Server load?)-VS

Systemic Bias[edit]

User might not understand the meaning of the different topics. We are seeing this in the German featured article candidacy page, where people submit their favourite articles based on "I like this topic" rather than "I like this article". -- Mathias Schindler 06:23, 22 May 2005 (UTC)

  • Highly featured topics will get a lot of attention and many votes. More specific articles will get few voters, often the authors of the article itself. Hence potentially a biaised vote for a while. Will improve as time goes by, but might stay a long term issue on some topics. Anthere 06:58, 22 May 2005 (UTC)
  • There are bound to be problems with the "usual suspects", namely anything related to sex/gender, religion and politics. Many people are not exactly happy with NPOV information on these, and are bound to vote negatively on, for example, homosexuality, because "the bible says" or "the moral majority says" or "'science' says" (or who/whatever says) that homosexuality is bad, and since the article does not clearly state so, it must be bad, too. There's also the problem group of people who think that something has to be wrong because they once heard otherwise (and I am not talking about reputable sources here); again, mostly a problem of the "usual suspects", because the gap between factually correct knowledge and willingness to voice an opinion is probably nowhere as big as with the "big three". Article validation is going to be a great feature, but there has to be some sort of safety net regarding those problems. No IP-voting, and maybe mandatory comments for some topics might help. -- AlexR 09:49, 22 May 2005 (UTC)
IP voting is useful mostly as the opinions of our readers, who greatly outnumber active editors. We may choose not to act on IP votes the same way that we act on logged-in votes, for example - David Gerard 12:13, 26 May 2005 (UTC)
  • These are good points and they are all reasons why we need to identify every voter and make each users voting history clear. Then any decisions made via the vote can take into account any systematic bias on the part of the voters. For example, if I was using verification tallies to determine what material to put on a DVD I would weigh each voters score based on his overall probability of voting yes/no so that voters which always voted one way would tend to left out of my count (after all, if you're telling me to keep everything, you're not actually telling me anything at all when I have no ability to keep everything). I do not agree that we should make comments mandatory, but rather we should by policy never use the raw count and voters should be aware that varrious things (lack of comment, anon contribution) are highly likely to make their vote not count. We should try to include every user in the counting process, especially the ones we are likely to ignore so that no one will forget that there are many classes of votes which will go unused for varrious reasons. --24.165.233.150 22:25, 22 May 2005 (UTC)
  • What about the special case of users who have edited the very article they want to vote -- pro or con? Imo, one should not be able to vote about a article s/he has edited. This may also be true backwards, meaning that previous user's votes would be cancelled once they have edited the page. The obvious point is that editors have an strong bias, either positive or negative (when they disagree and their edits are undone). This is especially true for controversial topics -- but not only.

--Denispir 11:09, 21 October 2008 (UTC)

  • I agree about the importance of weighing votes. IP votes and editor (of this articles) votes may be underweighed, as well as votes without comments. Also, the 'value ranking' of an article, resulting of votes or any other ranking method, may itself be ranked as more or less trustable. The overall number of votes would for instance higher this trust value.

--Denispir 11:09, 21 October 2008 (UTC)

Version skew[edit]

Edits are constantly being made to articles. When someone rates a page, they are rating a specific version, which is rapidly (depending on the page) superceded by a subsequent version. It's never clear whether a recent edit has completely messed up a page, fixed all of its problems, or tweaked it in an insignificant way. Depending on the application, this could be a problem. -- Beland 02:09, 25 May 2005 (UTC)

I agree. This proposal will undermine the dynamic nature of Wikipedia. --Peter Farago 06:45, 12 Jun 2005 (UTC)

My solution to this would be to only allow votes for the current version of an article. If an article reaches the validation threshold before getting edited, that version becomes stable, and any new version will need to be validated from scratch. If an article is edited before the votes reach the threshold, a small percentage of the votes would get transferred to the new version, based on the size of the edit to allow popular articles to get validated. A small spelling correction would allow the article to retain, say, 60% percent of it's votes, whereas a new paragraph would effectively restart the vote. If an edit is completely reverted the article gets it's old votes back as well (in other words, if the current version of an article matches a version in it's history precisely, it gets the votes from that version if they are higher than it's own). If the numbers are tweaked right, I think this should be enough to (almost) eliminate the problem. See my own proposal for more info. Risk 01:35, 22 November 2005 (UTC)

Quality of opinions offered[edit]

Depending on who fills out validation forms and how careful they are, the data that is collected could be measuring uninformed or hastily made opinions. Example 1: Many people rate an article as accurate, even though it is inaccurate, because it aligns with a popular misconception. Example 2: People don't take the time to double-check references; they are instead rated on how impressive-sounding they are. -- Beland 02:14, 25 May 2005 (UTC)

  • Hopefully the numbers will balance out. Also, I expect low accuracy numbers will mainly act as a pointer to fact-check an article or to improve its references - David Gerard 12:13, 26 May 2005 (UTC)

Newer edits with fewer votes[edit]

I also thought of another problem today. Let's say we have a decent and stable article on topic X, and it gets good votes over a rather long period of time. Now somebody edits this article, be it typos or be it a complete rewrite, but the article has become a lot better. Since that would be a new version, however, it would necessarily at first have significantly less votes. My guess is that people would rather go for the version with many good votes then the one with few of them, and therefore, in this case, for a worse version.

Now, for typos and the like, the solution might be that edits marked as minor don't count as a new instance of the article; that would necessarily have to include reverts of edits which are not so minor after all; we all have seen instances where trolls tried to sneak in edits as minor which were not. Still, to start a new vote for new version when all that was corrected was a typo does not seem to make any sense at all.

As for larger edits, that problem might be solved by giving percentage ratings rather than mere counts - the older version with, say, 50 votes for it and 5 votes against it would have a worse rating therefore as the newer, better one with say 20 votes and 1 against; which is as it should be. -- AlexR 13:19, 25 May 2005 (UTC)

With my comments on "version skew" above, I agree that handling this question is tricky. There is a certain number of votes below which the tally is unreliable. But if version A has 100 ratings and version B has 200 ratings, whatever the average rating is for either is probably a good indicator. However, there is a huge sampling bias simply from taking these votes during different time periods.
I guess solutions to the version-rating problem fall into classes - attribute ratings only to the version the reader saw, or allow ratings to "leak" across versions. You might allow one version to "influence" the rating of the next version up or down. The amount of influence should be small, at least after the threshold number of votes have been accumulated, but it might be a good default. The degree of "influence" might be adjusted by measuring the percentage of text that has changed. Of course, you can completely change an article by judicious insertion of the word "not", which would defeat this technique, but hey. -- Beland 02:50, 28 May 2005 (UTC)
I don't see how any version is ever going to get 100 ratings. Wiki pages are edited far too quickly for that.

I have proposed a mathematical solution for this elsewhere:

Semi-automation. Also, there will be a software aid: "Stable" versions are generally just that: stable; they haven't been edited in a while. An automated mechanism could go through the history and mark revisions with a tentative measure:

k * e^(-c*age) * e^(d*[time before next edit])

where c is the decay rate (more recent versions should be prefered) and d is the stability unit. both c and d could be inferred statistically per article, since some article are edited a lot(high d) and others not-so-much(low d), and some articles are about current events(high c), and some are about dead events(low c). Revisions with a high such measure are candidates for the next "stable" version, to be selected by community approval.

The choice of mathematical function (e) is based on information theory. it makes the semi-automation based on the average number of characters changed in an article per unit of time. Kevin Baastalk 18:21, 8 April 2006 (UTC)

Privacy notice[edit]

Is the information we are collecting personally identifiable? Will it be published only in aggregate form? Will anyone on the web be able to see each person's answers? Should the validation page have a little note explaining this, or a link to an explanation? What about the existing en:Wikipedia:Privacy policy? -- Beland 04:15, 25 May 2005 (UTC)

  • A rating is an action on the Wikipedia database just like an edit, so all ratings are visible to all. This should be covered by the existing privacy policy, but "Note: all ratings are visible to all users" might be worth noting so as not to surprise people - David Gerard 17:30, 25 May 2005 (UTC)

Retaliatory voting[edit]

(moved from Article validation feature) (Submitted as MediaZilla #977.) Allowing both positive and negative feedback on any given article leaves the system prone to malicious negative feedback, with editors who have personal grudges against each other modding down all of their opponents' edits. This fault is *inherent* in the system as currently constructed. Allowing only positive feedback, in the form of a "yes" vote, would fix this problem. Articles would be assumed-invalid until voted good, as opposed to assumed-neutral until voted one way or the other. (Yes, I do consider this a bug.) Grendelkhan 06:24, 2 Dec 2004 (UTC)

  • Michael Snow suggests that "weeding out the highest and lowest percentiles of an article's ratings would help restrict the tactics of self-promoters and POV warriors if we do get such a system going." - David Gerard 12:13, 25 Feb 2005 (UTC)
  • Based on the little tiny bit of this I've seen, I'd like to suggest that using "median" instead of "mean" would be a good way to reduce the effects of self-promoters and retaliatory negative votes. This is something else that we may want to make configurable per wiki... --Aerik 11:05, 5 Jun 2005 (UTC)
  • Only allowing "yes" votes makes sense. After all, if the page is messed up, we want people to fix it, not vote "no".
I agree with this, but it would cause one problem; given enough time, every article would get validated. If an article is left alone enough, it might get validated because of the number of random validation votes (if the article isn't edited, or validation points are carried over after an edit, somehow). Because of this, there should probably be a small daily decrease of the current amount or validation, or the threshold for validation should be increased slightly. This should probably be based on the popularity of an article too. Risk 00:47, 22 November 2005 (UTC)

Versions with templates aren't stable[edit]

Someone pointed out (on a mailing list or IRC) pointed out that an article version that includes templates will change with the templates, but the rating will be the same regardless. Hopefully this will not be a major effect - templates masquerading as article text is strongly discouraged on en: at least - but I can't think of a way around this one - David Gerard 16:41, 1 Jun 2005 (UTC)

Possibly having the history rendering of a page try to pull in the template version immediately before the page date/time. This won't be perfect, but it'll be more accurate than always using the current version of the template - David Gerard 22:11, 20 November 2005 (UTC)
I may be misunderstanding the problem, but wouldn't it be possible to validate templates the same as articles, and simply use the latest validated version of the template that the article links to (and diregard the template if it hasn't been validated yet)? If the template is essential to the page editors could simply hold a kind of validation drive, to get the template validated. Risk 00:17, 22 November 2005 (UTC)
Since editors use a template in the version it was at when the article was saved, saving that with the article (e.g. the version numbers of any templates with the saved article version, or trying to ascertain the most recent template version at the time of pulling the revision) would be the more elegant technical solution. A reader won't be able to tell if a given box is a template or a part of the article text or what, they'll be rating the entire construct - David Gerard 15:39, 28 November 2005 (UTC)

Magnus' current code has mitigated this somewhat [1]: "When you click on the "declare this stable" link, it now caches the source *with replaced templates*, meaning the stable page will look the same even if the templates it uses were changed or deleted." Note that this does not apply to every rated version - David Gerard 12:57, 21 December 2005 (UTC)

Amazon/IMDB review rating experience[edit]

Comment on Slashdot:

That's actually a terrible idea. A highly rated page would only mean that many find it agreeable, rather than that the content is good and accurate.

Want an example? I'll give you two: Amazon Customer Reviews. These reviews are rated, but almost always -not- based on their quality, but on the opinion the one giving the rating, has on the product (book, CD, DVD, video game). So if a review is perceived negative on the product, and a majority of people like that product, the review will receive negative ratings, regardless of the quality of the review itself!

And the second example is IMDB movie reviews. Same thing as with the Amazon customer reviews.

  • Yes, for sure. As Mathias Schindler writes above, articles will first be rated on the interest/importance/atractivity or whichever appealing feature of the topic, then maybe on the quality of the article. This bias seems very difficult to avoid, even with clever information and warning to the users.

--Denispir 11:33, 21 October 2008 (UTC)

Vote gangs[edit]

(posted by geni to wikien-l [2])

  • Validation wars with forums ganging up to vote for/against certian articles. evidence (no way should it be rated that high: http://newgrounds.com/portal/view/276616
  • Trolls will follow around users they don't like rateing their articles to zero.
  • An equiverlent of schoolwatch will turn up and start rateing all of a certian type of article to exelent
  • The anglo-american spelling war will use article validation as a new battle field.
  • We will disscover that our articles on pokemon and explodeing wales are our most valued.
  • at least one arbcom case will result from the switching on of this feature.

So all in all bussiness as usal and nothing to worry about (well no more than there normaly is). People will use it to cause trouble but unless we lock the database people are always going to be able to cause trouble.

Usability[edit]

From FiverAlpha
Jump to: navigation, search
Revision of 2005-11-29 13:53:53
User \ Topic    (Strangeness)   (Tpol)  (NPOV)  (stupid)        (funny)
80.161.48.96    3       1       2       3       5
See the validation statistics for "Main Page" here
Show my validations

or

From FiverAlpha
Jump to: navigation, search
1 - 3
Revision        Strangeness     Tpol    NPOV    stupid  funny
2005-11-29 13:53:53 (details)   3.0 (1) 1.0 (1) 2.0 (1) 3.0 (1) 5.0 (1)
2005-11-09 14:53:03 (details)   5.0 (1) 1.0 (1) 4.0 (1) 2.0 (1) 2.0 (1)
2005-10-09 02:18:52 (details)                   4.0 (1)         
1 - 3
Show my validations

And now you tell me what this means. Sorry, this incomprehensible number salad is just making usability of wikipedia for normal readers worse instead of better. And since there are no special pages to list articles according to validation criterias it doesn't even offer the chance to use it to improve wikipedia by finding bad articles (or did I miss a link somewhere?). In sum: Validation failed. Please don't switch it on until significantly improved and and actually usable. --elian

Number soup[edit]

This form of validation isn't going to work, because too much effort is required for each vote. There are too many numbers involved, most of which are meaningless. (Why would you ever rate something 1/5 instead of reverting it? What would a 2/5 mean? Etc.) All the formulas (discarding certain percentiles, rating users, etc.) would just encourage people to play games with validation instead of writing an encyclopedia.

My suggestion, which builds on others: you can choose to give the newest version of an article +1 point. Reverts get the points of the earlier version. A version is validated after the number of points crosses a threshold like 5 or 10.

I don't think it's true that "every article would eventually end up validated"; pages get edited quite frequently, after all. If one version of a page stays and 10 people say it's good, without editing it, spread out over time -- then that's a good article.

It sounds like the current version of validation is going in; that's sad. Hopefully, when it falls on its face, people will want to try other things like this proposal, and not give up on validation altogether or try to make creeping incremental fixes to that version. Rspeer 17:34, 20 December 2005 (UTC)