This page is part of the Proceedings of Wikimania 2005, Frankfurt, Germany.


Too Many Cooks Don't Spoil the Broth[edit]

  • Author(s): Andreas Brändle
  • License: Contact the author
  • Slides: ?
  • Video: ?
  • Note: Presentation, 30 minutes

About the author: Andreas Brändle is a student of media and communication studies at the University of Zurich, Switzerland. He writes his master's thesis about Wikipedia.


Does collaboration in wiki systems cause good quality? Supporters of wiki systems assume that the number of authors and their collaborative work lead to good Wikipedia articles. This study tests this hypothesis using methods of empirical research and media studies. It researches the conditions under which a wiki system produces good quality. Therefore a sample of 450 articles from the German Wikipedia has been researched with a content analysis. The study gives answers to the following questions:
  • What is the critical mass of users and authors in a wiki system?
  • What effect have number of edits, authors and discussion edits, traffic, age of the article and backlinks on quality?
  • What influence does vandalism have on quality?
  • Do anonymous authors contribute in a positive or negative way?
  • What is the connection between the relevance of a topic and the quality of the article?
  • Are there quality differences between topic categories?

Furthermore the study proposes a way of measuring quality based on the theories of media and communication science.


The relevance of wiki systems is increasing. Wikipedia, other Wikimedia projects and wiki systems in general grow exponentially in popularity. In the meantime we still do not know much about the quality in wikis, the conditions that cause quality and processes that maintain quality. This research project is an approach to fill this gap from the perspective of communication and media studies. It deploys theories of quality in journalism[1] and of the attention economy[2]. It is the intention of this research work to identify some of the factors that cause good quality work in a wiki system.

The topic-attention-quality model[edit]

Based on attention economic considerations[3] and inspired from ideas in the Wikipedia community[4] a basic model called the topic-attention-quality model has been worked out. It describes quality as the dependent variable. Independent are the variables quantifying the process of writing the articles, such as topic relevance and category, number of versions, authors and page views, anonymity or vandalism. This model was the starting point for the empirical research and the hypotheses.

The basic topic-attention-quality model

The model proposes three dimensions – topic, attention and quality – that are arranged along the writing process of an article. The first dimension, topic, gathers all the characteristics an article consists of, independently from what the authors have written and the resulting quality. Topics vary in relevance and category. For instance does the topic Berlin bring a much higher relevance to the wiki process than Heukewalde (a small village in Thuringia/Germany). Additionally the topic Berlin falls into the category geography that sharply differs for example from mathematics or literature.

The topic characteristics influence the second dimension, the attention. It gathers all the variables that quantify the attention the wiki contributors and users have brought in - variables such as number of versions, number of unique authors, traffic or discussion intensity. The two variables vandalism and anonymity take also an important part in the wiki process and interact with the attention dimension.

Supporters of the wiki system assume that the mentioned variables and dimensions have an effect on the third dimension, the quality. For instance, it is expected that the higher the number of authors, versions, page views, the better is the quality of the article. This research tries to verify this idea and measures the intensity of influence the topic and the attention have on the quality.


The research is based on a quantitative content analysis. A random sample has been drawn. It consists of 450 articles of the German Wikipedia. There has been data gathered on the article, its version history, the discussion page and history, the authors and the traffic. Due date to draw the sample was October 19th, 2004, because there were no user stats available after this date.

The variables[edit]

Topic relevance has been measured with three variables: 1. The number of words Brockhaus writes on the topic (Brockhaus is the most important German Encyclopedia); 2. The Number of Google results; 3. The amount of news values - a method in media studies to assess the newsworthiness of a topic for journalists. After that the articles have been classified into topic categories, sub categories and special categories such as technology, nerd-stuff, sexuality, politics, sports or other.

On the attention dimension the number of versions, authors, page views, discussion words, discussion versions, discussion authors, discussion page views and backlinks have been counted. The age of the article is the difference between the due day and the first version. The variables anonymity and vandalism have been assessed through the version history

The quality was the most difficult set of variables to measure. What is quality? Wikipedians have been discussing the issue of quality since the beginning in 2001. Communication and media studies have a much longer history of discussing the quality issue without having a clear solution to it. This study concentrates the quality assessment on the content dimension. It brings together some approaches from researching the quality of journalism with content analysis and ideas from the Wikipedia community on what a good article is.

Quality variables:

  • W-questions: Does the article give answers to the questions who, what, when, where, why?
  • Lead section: Does the article have a lead section that briefly sub- and super-ordinates the topic to other concepts and does it give an impression on its relevance and its history.
  • Background section: Does the article have a background section that explicates the relationship with sub- and super-ordinated concepts, its relevance and its history.
  • Degree of cross-linking: Number of internal, external and interwiki links.
  • Article structure: does the article have a lead, background and commentary section?
  • Transparency: Number of references in the text, references to book sources and external links.
  • Number of media objects: pictures, tables, charts and images.
  • Layout/structuring: to many titles, to little titles, too much listings.
  • Comprehensibility: Number of non-linked foreign words and technical terms.
  • Size: number of words.
  • Objectivity: Are viewpoints marked as such? Are there also arguments that disagree with a stated viewpoint?
  • Diversity: Number of different viewpoints.
  • Balance: Balance of different viewpoints.

Data reduction with factor analysis[edit]

After the gathering of the data a factor analysis has been conducted. Factor analysis is a statistical technique to reduce the number of observed variables to a few unobserved variables, called the factors. It created five factors:

  • Relevance: The three variables Brockhaus, Google results and news values resulted in one single factor – the relevance
  • Interest: The amount of attention given to an article (Variables: Number of versions and unique authors, traffic, age, number of backlinks)
  • Controversy: The intensity that an article is discussed with (Number of discussion words, discussion authors, discussion edits, traffic discussion)
  • Richness: Main quality factor, comprehends every dimension of quality, besides the neutrality variables (W-questions, lead section, background section, degree of cross-linking, article structure, transparency, number of media objects, layout/structuring, comprehensibility, size)
  • Neutrality: Comprehends the neutrality factors (objectivity, diversity, balance)

These data-reduced factors have then been implemented into the basic model. Fig. 2 shows the topic-attention-quality-Model after the data reduction.

Fig. 2: The topic-attention-quality model after data reduction with factor analysis

The relations between these factors/variables, indicated with arrows, have been researched with multiple regression analysis to uncover the dependencies.

First Results[edit]

The data analysis is still under its way. Final results are going to be presented at Wikimania 05. In the following you will find some first results.


Topic relevance has a strong and highly significant impact on the factor interest (R square = 0.53, Beta = 0.73) and on the quality factor richness (R square = 0.56, Beta = 0.75), a moderately strong influence on the controversy (R square = 0.1, Beta = 0.44) and a very weak regression with the quality factor neutrality (R square = 0.03, Beta = 0.17). That means that a topic of high relevance is very likely to reach a good quality (richness). In the meantime the topic relevance isn’t a reliable predictor for neutrality. Additionally more relevant topics are also attracting more unique authors.


The attention factor interest itself has a strong influence on the quality factor richness (R square = 0.54, Beta = 0.74). In a multiple regression model the factor topic relevance and the attention factor interest already explain 64 Percent (R = 0.80, R square = 0.64) of the quality variability, which is a very strong regression. These two factors alone already predict most of the quality.

Out of the variables that constitute the factor interest (number of version, authors, page views, article age, backlinks), the number of authors is the far most important variable to predict the quality. Together with the relevance factor it also explains 64 Percent (R = 0.80, R square = 0.64) of the quality variability.

Fig. 3: The relationship between the number of authors and the quality richness

The factor controversy that measures the discussion intensity of the articles doesn’t have a strong influence on neither the quality (richness) nor the neutrality, if the influence of interest and relevance is subtracted. The same is true for vandalism, anonymity and the topic category.


Sunir Shah's presentation, Transwiki:Wikimania05/Presentation-SS1, Controversy and stability: How wikis have productive conflict, will refer to this paper.