User:Agtfjott/Article quality and user creditability

This is work in progress. Please contact the author if you have any questions. Further mainanence will be at rallar.

Article quality and user creditability is an automatic moderation system which tries to estimate quality and creditability measures for articles written by unknown users. It has some similarities to karma systems but does more closely follow a model of plain old work and payment. The articles in question are Wikipedia-articles undergoing editing, often with several collaborative editors. A user is any user editing such an article. The model consists of two parts, an article part saying something about quality of articles and an other part saying something about creditability of the users.

The measures for article quality and user creditability are then used to make a decisions when there should be a stable version or if the new version should be a unstable version of the page. An anonymous user reading a page will then be handed the last stable version until he tries to edit the page, then he will be informed about the existence of a possible newer unstable version. A logged in user will be assumed to know about this distinction and will be handed the last version, no matter if it is stable or not. An anonymous user can choose to access the unstable version directly through an additional tab.

The model

When a user contributes with manual labour to an article in a good way, the user gains creditability. The kind of work the user does is not really important, but different kinds of work on an article at different stages can be transferred into different amounts of credits. When a user has made an incorrect/trollish contribution and is reverted, he will lose his credit for the erroneous edit he made, and also get an additional penalty. The penalty (fine) should be set at a level that ensures a regular troll to keep on loosing.

c^{n}=\sum _{\Delta j_{n}\in I}|p_{j_{n}}*\log(p_{j_{n}})|\,

simplistic credits per iteration

An article also has its own quality measure or score given as the amount of information contained in its individual parts. This means that each word has a value depending on how frequent it is used in texts, e.g. the probability of words like an, he or the are significantly lower than radio or novel. This is described as Shannons entropy or measure of uncertainty.

s^{n}=\sum _{i_{n}\in I}-p_{i_{n}}*\log(p_{i_{n}})\,

absolute score for a version

Inspecting our previous function we see that it is modeled after Shannons entropy as the sum of all the changes in the measure of uncertainty.

It is also possible to use a history score for the evaluation of an article. This uses the articles complete history up to some point in time and can be defined similar to the user credits.

Automatic dirty marking

When a contributor wants to edit an article his credits are compared to the articles score, and if he has enough credits he is allowed to start editing the article. As the article is edited is slowly gain quality. At first the article reaches a level where newbies and anonymous users is not allowed to directly edit the article any more. To continue directly editing the article the user has to get enough credits, usually by editing less prominent articles.

q^{n}=s^{n}-f(c^{n})\,

single user mode

The value $q^{n}$ describes wetter the article should be dirty marked or not, and the function $f$ could be something as simple as a linear function.

If an user without enough credit edits the article the page becomes unstable. If this happens anonymous users visiting the page will be handed the stable version. A logged in user will see two articles at this point, both the stable one and the unstable one. As a rule of thumb this should happen around an article size of about 5 KB.

During RC-patrol the patrolling entity does not by default simply mark the edit as patrolled, instead they cast a vote on the quality of the edit. They can only cast a single vote and if they get any second thought they can go back and cast a new vote. The effect of the vote is calculated according to the credits of the patrolling entity, weighted against the quality of the article. When sufficient votes are given the patrolled version locks in a state, either as accepted or refused. If the quality is high, then more credits are necessary to lock the version then if the quality is low. A patrolling entity with enough credits should be able to lock a specific version by casting a single vote.

q^{n}=s^{n}-\sum _{j_{n}}f(c^{n})\,

multiple users mode

The decision is left to a group of users. It is assumed none has more to say than others except for the credits earned.

A variation is to use two limits, one to accept an edit and one to refuse the edit. This introduces versions which can be in three states, accepted, refused or indeterminate. If an admin question the outcome he can set the state to indeterminate and cast his own vote. If he has not enough credits the version stays indeterminate until one or more users casts additionally votes. Any admin should although be able to put the version in indeterminate state. Note that a unstable version by definition is in an indeterminate state.

There should be no cost in voting, but the votes should be weighted against the credits. This is not really credits in this respect but the users reputation.

A pretty easy visualization in the history could be to use colored exclamation marks. Edits from a user with enough credits will create a stable or accepted «green» version, a rejected version is a «red» version while an unstable version is a «yellow» version. The last green version is the one stable and served to anonymous users.

Enforced collaboration, guilds and vouched users

Even further into the future the the article's quality level rises even higher and at some point even marks edits from a long time contributor as dirty and always makes unstable versions. When this happens the user can either join others to be able to edit the article or he can let other users patrol his edits. If one single additional user is not enough to check out the version then additional users can check it out to collectively get above the necessary credit limit.

At this point the system starts to enforce explicit or implicit collaboration. Typically an article which passes about 10-15 KB should start to enforce collaborative work at every article. Note that this is a rule of thumb, the actual limit is given by the amount of information there is in the article.

A small scale approach to collaboration can be vouchers, whereby one user says "I believe in this user". One single user is then given an additional credit to be able to work on a specific article or category. This voucher is like an automatic vote when the contributor change the article and creates a new version. The user contributing the voucher gets credits and fines with a constant factor, just like the holder of the voucher. A very tempting visualization of a voucher would be to use something like a user box, with a reference to who has given the voucher and for which purpose.

c'^{n}=c^{n}+\sum _{k}c_{k}^{n}\,

user with vouchers

A much larger scale approach would be to form a collaborating guild. In this group the users share vouchers in a pool to get sufficient amount of credits to be able to edit articles of high quality. The account is assigned to a project or category to limit the usefulness to form a troll-group to go around and troll in articles. All members in a guild should vote for a newcomer before he can successfully join and if the user is ejected from the group he can not leave with his earned credits. Any penalty for malicious edits goes to the group account, including additional group fines but excluding any user specific fines. When someone leaves the guild he gets a even share (?) of what is gained through the work. (I'm not sure about how to divide the gains and pains in such a group. There are also some indication that the fine can lead to an instability.)

It is not meaningful to process individual contributions from one and the same user. All edits done by one and the same user in a continuous fashion should be merged. Otherwise the rather strange situation arise that a user could be able to give himself a fine. When a guild is formed the users in the guild should be merged in a similar fashion.

A very effective countermeasure when a guild develop bad behavior is forced dismissal of the group, and seizure of the credits locked by the group.

Credits and fines

The actual assignment of the credits and fines can be done quite simple. Assume contributor A introduces the words a, b, c, d and f. All of the words are entered in a table and compared to a list of known non-interesting words. The words c and d is on this list and is removed. Later contributor B comes and removes f but keeps a and b. The user A should then be given credits for a and b but a fine for f. Most of the time the situation is much simpler and everything is kept or everything is reverted.

User credits accumulates from the first version up to the last $n$ version, and if we assume the credits to be $\alpha ^{n}$ (our previous $c^{n}$ ) and fines to be $\beta ^{n+1}$

c^{n}=c^{n-1}+\alpha ^{n}-\beta ^{n+1}\,

The set $\alpha ^{n}$ is the set of added words at version $n$ , while $\beta ^{n+1}$ is the set of removed words at the next version $n+1$ (the reverted words). This is a bit troublesome as the reverted words are observed in the future and should be connected to contributions for a specific user. A simple optimization, which also seems highly logical, is to limit fines to users who introduces concepts initially or removes the concept finally. This is easily noted contrary to trying to track who says what in an article.

The user creditability for a contributor should decrement if he continues to make contributions which gets reverted, leading to $\alpha <\beta \,$ or preferably $\alpha \ll \beta \,$ which makes sure trolls always looses. It is possible to deduce how big the difference should be by statistical analysis. This could be done by picking marginally bad trolls and comparing them with marginally good contributors. Still note that the difference between $\alpha$ and $\beta$ leads to the strange situation where a troll might continuously revert good contributions. It does not although seem to be a very viable way of trolling as it will lead to blocking of the troll.

It is possible to make variations of this, including attempts to track changes further back in time and trying to estimate the importance of the changes. Assigning of weights to individual words could be done at an overall level and adjusted at a category level. Note that such category adjustments are difficult to do. This is a posteriori adjustments in the model.

It is important to note that various actions should earn various amount of credits. This is to avoid the situation where a deletionist looses compared to an inclusionist. Admins and bureaucrats might be given credits up front to lift them to sufficient levels, while a newbie should not be given credits up front to block the use of swarms of sockpuppets as an option for trolling. Some types of users might also get less credits or non at all like a bot. Sometimes a contributor is known in an field and can be assigned credits accordingly. Such credits should be limited to a specific field like a project, a category or an article. This is a priori adjustments in the model.

It seems likely that if an admin or bureaucrat gets fined, and then gets below some level, the users credentials should be considered revoked.

New users

A newbie starts at zero credits, and this also includes anonymous users which is reset to zero credits for each new session. This leads to a slight problem as they will not be able to edit any page without generating an unstable version. This can be counteracted by adding a small constant in the test for creating the unstable version. It seems like the problem is more of a feature then a real problem as it enforces RC-patrols to vote for the change.

To create an article costs a small amount, and a newbie does not have the credits for this. This could be a good idea but it can also be bad as important articles wont be seeded. If a newly created article is by default unstable the problem can be avoided altogether. This is a very promising variation from the present solution which blocks users from creating new articles.

If there is a cost of creating an article then this cost is returned when the article goes stable.

Note that a situation whereby an user is blocked from editing as when the semi-protection is in use, will make it necessary to increase the blockage of article creation to the same level.

Bounded credits and scores

The credits and scores could be bounded as to avoid a situation where an article is no more editable or an user becomes some kind of super user. This could be an important constraint. I have not consider this options very much. It could come into use if someone fails to cooperate and has enough credits to avoid collaborative work.

Given different methods of assigning vouchers, wetter they should lock credits or not, it can be necessary to limit the maximum credits any users should hold to avoid loan sharks to control a large number of articles. One possible way to block this is to give no credits back to those who vouched for someone. Another option is to limit the maximum amount of credits.

Article or category specific credits

There is also possible to extend the users credits accounts to make article or category specific credits. One implementation of this is to add full credits to an article or category, then lessen the credits by a factor or constant as links are traversed to other articles or categories. Together this will give a high impact on the edited article or category while trailing off in nearby articles or categories.

Given two nodes a and b both connected through c and through d and e, then the connection sharing between a and b will be $\xi +\xi ^{2}$ or a bit more general

\sum _{k\in K}\xi ^{d_{k}}

As an optimization the local credits can be calculated from its closest neighbors only, a distant credit calculated from its closest categories and a general overall credit in addition to both. This will maintain specific credits for articles and categories where the user develop a bad habit or seems to have excellent contributions. It seems likely that local adaptions should develop slower than the overall credits. (It is possible to argue both for slower and faster growth) Such credits could be modeled with fines like the overall credits.

An arbcom review could use this to enforce a block within a specific article or category.

Craftsmanship value

Value of reputation

Anonymous reader factor

Anonymous users reading an article will try to fix problems at some rate. Because of this an anonymous reader will add a small quality factor to an article each time it is read. The factor can probably be approximated by the inverse of the fix it rate for such anonymous readers.

Historical article score

Use of a historical article score

Extended mathematical model

Slightly reformulated and extended whereby the previous history is taken into account

c^{n}=\sum _{\Delta j_{n}\in I}-p_{i}*\log(p_{i})\,

credits per iteration

h^{n}=\sum _{\Delta i_{n}\in I}|p_{i}*\log(p_{i})|\,

differential score per iteration

s^{n}=\sum _{i_{n}\in I}-p_{i}*\log(p_{i})\,

absolute score for a version

{\begin{bmatrix}c^{n}\\h^{n}\end{bmatrix}}={\begin{bmatrix}c^{n-1}\\h^{n-1}\end{bmatrix}}+{\begin{bmatrix}\alpha &k_{1}\\k_{2}&\beta \end{bmatrix}}{\begin{bmatrix}c^{n}\\h^{n}\end{bmatrix}}

Credits the user has earned
Fines the user gets for malicious edits
Accumulating differential score of the article (history score)
Absolute score of the article
Loop back
- A reputation value added to the credits coming from the absolute score of the article
- A craftsmanship value added to the article score coming from the user credits
Web of trust
- Guild factor within some context, for example listed todos on a project page or a category
- Voucher factor whereby one or more users says a specific user may have higher credits on some pages

Note that this only uses one overall score for the user. This can be extended to either a overall value and local adaptions for each article, adding weight to nearby articles through linkage weight, or it could only use such weighting. The first leads to a situation where a user gets some reputation anyhow in a new field, while the last one forces the user to earn a new reputation in every new field. perhaps a new user should start with the last model but at some point start to gradually implement the first. It could also be implemented in such a way that a sysop uses the overall model while a common user uses local adaptions.

Implementation

There will be an user warning if he can't edit an article because he has to few credits, telling him he will create a unstable version or that he can leave a note at the discussion page or be part of a guild
There will be a link listing nearest guild by following the category links upwards to a broader category or a project page
A special page to inspect user credits, perhaps as very coarse scale on recent changes, page history and user contributions (note that credits can be modified by category)
A special page to inspect article score, perhaps as very coarse scale on recent changes and page history

Known problems

Guilds can lead to groupthink

References

Not exactly references, clean it up later.

The model

Automatic dirty marking

Enforced collaboration, guilds and vouched users

Credits and fines

New users

Bounded credits and scores

Article or category specific credits

Craftsmanship value

Value of reputation

Anonymous reader factor

Historical article score

Extended mathematical model

Implementation

Known problems

See also

References