Wikistats/Measuring Article Quality/Operationalisation for wikistats

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This article focusses on operationalisation for wikistats. See also the conceptual discussion on measuring article quality

Complications[edit]

Templates are a complicating factor. Many articles are partially built from templates. When these are not included in the evaluation, e.g. a ratio of section headers versus article text size might be completely wrong. Inclusion of templates text before assessing article quality is a very costly operation in terms of server resources (cpu, memory) in an offline job like wikistats. Turning wikistats into an online job that accesses the MySQL tables directly is completely unfeasable (the xml dump for the English Wikipedia (which is probably already highly optimised) runs for a week already. Kind of a dilemma. This might change when a dedicated wikistats server becomes available (this is currently being discussed).

What can practically be measured?[edit]

Please only add items to this list when you have a clear notion on how to operationalise this. As far as wikistats is concerned interfacing with perl is a requirement. Please add other suggestions at the bottom of this page..

Note 1: To begin with, only namespace 0 records will be measured (proper articles). Certain types of articles could be skipped, like disambiguation and redirect pages.

Note 2: Scales need not be linear. See below at 'How to present the results?'.

Note 3: Most indicators will have to be correlated to the length of the article body text. Lengthy articles would be expected to have more sections, images, references, categories, etc (as always only on average).

Note 4: Factors below need more detailed operalisation yet

Content[edit]

  • Text size (most articles which are not redirects should at least exceed a certain minimum size before they can be called satisfactory). Above a minimum (stub) size it is hard to tell what size is right.There may even be too much text. Detecting stub templates will not do, as these were not used widely in old days, hence hampering historic comparisons.
  • Number of images

Meta data[edit]

  • Number of references
  • Number of categories
  • Number of interwiki links (no relation to article size here)
  • Number of section headers
  • Number of external links

Reliability[edit]

  • Number of unique registered authors overall for this article
  • Ratio of unique registered authors versus unique contributing IP addresses (related to number of unique anonymous editors, but not the same) in recent days.
  • Is the latest edit done by a registered user?

Language[edit]

See below for initial thoughts on spelling, grammar and radibility assesments.

How to present the results?[edit]

Wikistats reports might show indices per measurement using a point system, to be presented per language in a table with one row per month, and one column per measurement. E.g. an article gets points for including references, minimal points means no references, maximum points means just about enough refereneces. Intermediate values signify not enough or too many references (again to be operationalised in detail)

Probably all indices should be normalized on a scale from say 0-100. An aggregate column would present the weighed average of all indices. Of course some quality indicators should be kept out of the equation for months where they were not applicable yet (e.g. tags for images, categories and references all were not part of the syntax long ago). Ideally a user might overrule how indices are combined into a weighed average. Customised parameters could be stored for reuse in a cookie.

The scales need not be linear. For example: providing one category for an article is much better than none. Adding a 10th category may be less useful or even confusing.

Weiging of indices will probably evolve over time (e.g. existence of references is now much higher valued than a year ago). Wikistats will apply new weighing to complete history of course.

Suggestions for measurements that are not yet enough operationalised[edit]

Rather practical[edit]

  • Percentage of binary files (mostly images) with proper licensing info
There seem to be bots that detect this automatically (probably in a language dependant fashion)
  • Retrieve Google ranking for each article
Of course a page might be ranked high in Google because it is exposed as utterly crap. On average this will not be the case.

For millions article this process would be very slow, coordination with the Google staff team would be helpful.

  • Count spelling errors per 100 words
Suggested by Alterego. There are modules to assess spelling errors but probably only for certain languages. For wikistats: can they be invoked from perl? In the future can we feed those modules with content from WiktionaryZ? Can a module deal with a mix of spelling conventions like US and British English (allowing both, as is convention on Wikimedia, yet substract points for mixing both conventions in the same article)?
  • Asses proper grammatical form
Suggested by Alterego. To be elaborated on.
  • Asses readability
Suggested by Alterego. To be elaborated on.
Suggested by Soandos. Use standard readability measurements (perhaps an average of those available?)

Very ambitious[edit]

  • Scan for images of pictures that are incorrectly attributed to famous artists (just joking)