Article count reform vote

From Meta, a Wikimedia project coordination wiki

See Article count reform

As of March 13th 2003, 20:00 GMT, no new options may be proposed for this voting round. We will now vote on which solution should be used. The voting system we use is average voting (also known as range voting). Simply put, you assign each option you care about a value from 1 to 6 (1 being very good, 6 very bad); the mean average for each option is calculated, and the best-rated option wins. The deadline for determining the outcome of the vote is Monday, March 17th, 20:00 GMT.

Duplicate and invalid (such as 7 or -2) votes are not counted.

Proposed solutions[edit]

VOTING IS NOW CLOSED. DO NOT ADD ANY VOTES BELOW.

Note that some of the solutions below can be combined; specifically, an article size criterion can be combined with one or more extra restriction(s) on what constitutes an article.

Article size[edit]

Please note: All article count proposals (including non-zero) do not count articles that contain nothing but whitespace (blanks, tabs, newlines), and do not count redirects in such manner.

Non-zero: An article is counted if it is greater than 0 bytes per page.

Pro:

  • Very simple, straightforward definition. It may count some stub articles, but in general, even stubs should be considered articles -- or removed. Blanked articles are not counted.

Contras:

  • Perhaps the threshold should be slightly higher, to avoid counting recently added nonsense pages.
    • Retort: this is a very small noise fluctuation (mostly less than 10%), and best handled by blanking, removing, or turning them into articles as they are found.
  • Bot-generated articles are counted, which may be considered a problem if the count is supposed to reflect actual human effort.
    • Retort: but if it's meant to reflect content, such articles should be counted. And human effort can be seen in edit counts.
  • articles that have been incorrectly blanked (eg a single space or a single new line) will show up in the count.
    • Trailing whitespace is removed on save, so such a page would be zero bytes long.

Votes for this option:



5 bytes: An article is counted if it is greater than 5 bytes in size no matter what.

Pro:

  • Eliminates essentially blanked articles inadvertently left with blank spaces or lines.
    • This is a non-problem, see above.

Votes for this option:


20 bytes: An article is counted if it is greater than 20 bytes in size.

Pro:

  • Could solve the problem of nonsense articles being counted.


Contra:

  • 20 bytes is shorter |<-- than this line, not a significant improvement in thresholding.
  • Easy to manipulate
  • Too big; 20 bytes of Kanji can be enough for an article.

Votes for this option:


100 bytes: An article is counted if it is greater than 100 bytes in size.

Pro:

  • The smallest stubs (one line or 2 short lines) are no longer counted

Contra:

  • Some consider small stubs legitimate articles too.

Votes for this option:


250 bytes: An article is counted if it is greater than 250 bytes in size.

Pro:

  • Stub articles are no longer counted.

Contra:

  • Some legitimate articles might always be this small.
    • retort: That's questionable - perhaps we should merge content that will be a permenant stub
  • Some crap articles including datadump'd stub are bigger than this size.

Votes for this option:


500 bytes: An article is counted if it is greater than 500 bytes in size.

Pro:

  • Stub articles are no longer counted.

Contra:

  • threshold is too high.
  • The number of articles in en wikipedia become under 100,000.
    • retort:delaying the implementation may solve the problem.
    • retort:If the number of usable, information-containing articles really is less than 100 000, so be it; we don't need to force lower thresholds to get nice, meaningless numbers.

Votes for this option:


Dynamic: For example, calculate the average size of stub articles then use it as threshold

Pro:

  • Instead of arbitrary number, use the number at least based on statical data.

Contra:

  • Determine what is a stub may be a problem (see below)
    • Retort: Use the <stub> tag or flag.
  • The threshold varies in time

Votes for this option:


compaction of language: There seems to be an issue with compactness of different written languages. If we choose to define articles by being a certain size, i.e. 250bytes, then perhaps there should be a scaling factor for each language. To do this we could take a passage in English with a certain size, say 2500bytes and get a native speaker of each language to translate it to their language. If in Japanese (for example) the translation is 1500bytes then the criteria for an article in Japanese should be 150 bytes instead of 250bytes. [Or, idea just came to me, find a very common text, War and Peace or a Dickens novel or even the Bible and use that as the base comparison.]

  • The second method seems better than the first, since a translation will be different (I think generally larger) than an original text. Also it is more objective.
  • Note: could use the translations file for basic language-verbosity info to a 1st approximation

Votes for this option:

Further restrictions[edit]

No further restrictions: Only the above size criterion should be used.

Pro:

  • simplicity.

Contra:

  • article definition not very accurate.
  • we still must not count redirects, user pages, talk pages, etc.

Votes for this option:


Comma: Only an article that includes a comma is counted.

Pro:

  • Compatible with the current system
  • History of article statistics doesn't break
  • Normal articles in English must contain comma.

Contra:

  • Unfair because some language notoriously Japanese don't use comma much.
    • Retort: Those languages could be excluded from the comma rule.
  • A bizarre criterion that can be thwarted by adding a comma to every article that lacks one.
    • Retort: Not more bizarre then trying to define an article by its size or the number of edits. Every criterion can be thwarted.

Votes for this option:


Language-dependent punctuation: Only an article that includes particular punctuation dependent on language is counted. (e.g. ?or ? in Japanese)

Pro:

  • en wikipedia remain untouched
  • most of languages use certain punctuation

Contra:

  • Requires an internal decision for each language

Votes for this option:


Link: Only pages with at least one link (existing or broken) are counted.

Pro:

  • Would remove unchecked newbie pages, as well as some arguably non-encyclopedic content, while keeping in most legitimate articles.

Contra:

  • Might still lose some legitimate articles.

Votes for this option:


Stub flag: Stub articles are flagged in some unique way so that an alternative count can be provided that does not include stubs. For example, by linking to them from en:find or fix a stub (or equivalent). Other similar flags could be necessary (to exclude various lists, which are not stubs), but they can be implemented the same way.

Pro:

  • More accurate definition of "article", stub criterion is provided by humans, not by some arbitrary byte size.
  • On en:wikipedia we already link stubs in this way

Contra:

  • Extra effort.
    • Not really, we already link to "This article is a stub from many stub pages. This information is stored already (try "What links here" on the stub page), it just needs to be standardized.
  • There are many stubs that contain useful information
    • So it might be useful to provide both counts.
  • More confusing meta information in articles for new editors
    • It's already there.
  • Defining what qualifies for a stub flag would be a whole new debate
    • We already do this.
  • (similar to the <ARTICLE>-tag further down)
    • No, not at all, we already do flag stubs, we just don't use the info.

Votes for this option:


Minimum edits: An article is counted only if a certain number of edits has been made (e.g. 2).

Pro:

  • Bot articles no longer counted.

Contra:

  • Some people want bot articles to be counted
    • Suggestion: we could offer a variety of counts, rather than just one
  • Article may be perfect even though it has only been edited once. (eg moved from Nupedia, etc)
  • We may need another round of voting to set the required # of edits

Votes for this option:


Minimum contributors: An article is counted only if a certain number of contributors have edited it (e.g. 3).

Pro:

  • Bot articles no longer counted.
  • By agreeing on the number of contributors needed, we can assume a "certain" degree of quality for the articles (this is an average, of course), meaning the article has been read, re-thought, re-modeled, etc by different people.

Contra:

  • Some people want bot articles to be counted
    • Suggestion: we could offer a variety of counts, rather than just one
  • Article may be perfect even though it has only been edited once. (eg moved from Nupedia, etc)
  • We may need another round of voting to set the required # of contributors
  • Is more a 'quality' measure.

Votes for this option:


Two paragraphs: An article is counted only when two-paragraph long or more.

Pro:

  • Encourages people to break up paragraphs appropriately, which is good style

Contra:

  • English encyclopaedia contains many one-paragraph articles
    • Retort: but how many of these are stubs that we don't want to count?


Votes for this option:


<ARTICLE> Tag: A tag is added to all entries that can be considerd articles.

Pro:

  • Small articles will be included
  • Lists can be excluded, if so desired.


Contra:

  • Some people may put the tag on things that others may not consider articles.
  • Extra effort.
  • More confusing meta information in articles for new editors
    • Retort: Not very complicated or hard to understand

Votes for this option:


Independant systems for each wikipedia

Choice being decided by each wikipedia. Most could have the same system in the end

Votes for this option:


Divide the size of database by certain byte-size to determine the number of articles.

Pro:

  • Simple to calculate
  • More difficult to manipulate the number of articles.

Contra:

  • Estimation lacks accuracy, especially because it includes talk pages.

Votes for this option: