Wikimania05/Paper-BO1

From Meta, a Wikimedia project coordination wiki
This page is part of the Proceedings of Wikimania 2005, Frankfurt, Germany.

 

 

 

Editing notes:


The role of feedback and NPOV on meme evolution in the wikisphere[edit]

About the author: Boud Roukema is a French cosmologist who obtained his PhD in 1993 at Mount Stromlo Observatory (Australia), made short visits to the astronomers at the Institut Teknologi Bandung (Indonesia), continued his research during postdoctoral visits at the Institut d'astrophysique de Paris, the University of Sussex (UK), the Observatory of Strasbourg, the National Astronomical Observatory of Japan, the Inter-University Centre for Astronomy and Astrophysics (India) and the Paris Observatory, and now does cosmology research and teaching at the Nicolaus Copernicus University in Toruń in Poland. Independently of his cosmology research, Boud is interested in human rights and the development of independent, free media where he lives and around our tiny planet. Home pages on some of the wikibooks: en:user:boud fr:utilisateur:boud pl:wikipedysta:boud.

Abstract

The wikisphere (wikis, especially the wikimedia wikis) demonstrates a method of reorganising information that is very different both from state controlled media and from corporate media. In all models of information reorganisation, the massive amount of information potentially obtainable from over six billion people needs to be condensed by factors of a thousand, a million or even a billion if the aim is to give everybody a chance to contribute to knowledge. This high factor of information compression necessarily implies a very high risk of censorship - e.g. removing or hiding memes against the interests of the government or the market economy. Does the wikisphere enable undesirable (for the government or for the economy), but correct, memes to survive, and possibly even to become widely distributed? Can this be quantified? Three of the most obvious factors in the wikisphere which distinguish it from government and corporate media are very short time-scale positive and negative feedbacks (adding and removing information) and the neutral point of view (NPOV) principle which, in principle, enables the survival of memes which could otherwise be censored. The aim of this project is to understand something about meme evolution in the wikisphere by quantitatively measuring the roles of some of the positive and negative feedbacks and the NPOV, using the data present in the Wikimedia databases. A brief introduction to the concepts and ideas for methods was presented. The software to carry out the full analysis will be available as a GPL package. At the time of the August 2005 wikimania meeting, this project had the status of a project description and an open invitation for others to participate, wiki-style, in the project. The aim is to obtain quantitative, clear, GFDL results using GPL software, not just qualititative intuitions.


Résumé (abstract - French)[edit]

La wikisphère (les wikis, et en particulier les wikis du wikimedia) démontre une méthode de réorganisation de l'information qui est très différente à la fois de celle des médias controlés par l'État et de celle des médias privés. Dans tout modèle de la réorganisation de l'information, l'énorme quantité d'information potentiellement recueillable de plus de six milliards personnes doit être condensé par un facteur de mille, d'un million, voire d'un milliard si le but est de donner à tou-te-s une chance à contribuer au savoir. Ce facteur de compression d'information nécessairement conduit à un risque très élevé de censure - ex. enlevant ou cachant des mèmes qui s'opposent aux intérêts du gouvernment ou l'économie du marché. La wikisphère permette-t-elle aux mèmes qui sont peu désirés (pour le gouvernement ou pour l'économie), mais valables, de survivre, voire même d'être largement répandus ? Est-il possible de quantifier cette question ? Trois des facteurs de la wikisphère les plus évidents qui démontrent sa différence des modèles gouvernmentaux et corporatistes du média sont des feedbacks positifs et négatifs à très courtes échelles de temps (rajoutant et enlevant de l'information) et le mécanisme ou principe du point de vue neutre (NPOV), qui, en principe, permet la survie de mèmes qui autrement seraient censurés. Le but de ce projet est de pouvoir comprendre quelque chose de l'évolution des mèmes dans la wikisphère par la mesure quantitative des rôles de certains des feedbacks positifs et négatifs et du NPOV, utilisant les données présentes dans les bases de données wikimedia. Une courte introduction aux idées de bases et aux probables méthodes a éte présenté. Les logiciels pour faire l'analyse entière sera disponible en tant que paquet GPL. Lors de la conférence wikimania d'août 2005, ce projet avait le statut d'une description de projet et une invitation ouverte à la participation, style wiki, dans le projet. Le but est d'obtenir des résultats quantitatifs, clairs, protégés par le GFDL, utilisant des logiciels GPL, plutôt que seulement des intuitions qualitatives.

Introduction[edit]

The wikisphere (wikis, especially the wikimedia wikis) demonstrates a method of reorganising information that is very different both from state controlled media and from corporate media.

In different systems of information reorganisation and distribution, governments, media organisations, intellectuals, usually make some claim to represent the truth. Since the Enlightenment, the right of ordinary people to participate in information production and distribution has generally been considered a positive goal (or achievement). This is goal is similar to the goal of freedom of speech. The internet, especially the world wide web, and more so, wikis, have made this goal appear closer to a practical possibility rather than a mere ideological slogan. The wikipedia continues in this tradition, and (as of July 14, 2005), states at the top of its main page its claim that anyone can edit it. While the claim is clearly false as of 2005, since most people do not (yet) have internet access and so cannot edit the wikipedia, what is relevant here is that it is considered a worthy goal, worth showing very prominently to readers.

However, any system in which everyone could participate would include contributions from around six to ten billion people (during the XXI century). In an encyclopedic information system which includes (for example) names of villages, their local histories, aspects of local cultures and languages, hobbies, biographies of individuals well-known to their communities, it is conceivable that literally everyone has some useful contribution to make, even if this only consists of small corrections. In a news distribution system which values every human being equally, the important events from the points of view of six to ten billion individuals would have to be collected and distributed.

1. Assertion It is obvious that no human being can physically read six to ten billion news reports or articles each day, let alone even in his/her lifetime.

The corollary of this is that any information distribution system which aims to give the reader information in a compact enough form to be absorbed in a reasonably short time (e.g. one hour), and which genuinely collects information from everybody (maybe using many indirect methods, but sourced to all six to ten billion of us), necessarily needs to condense the information by a factor of about a million (e.g. if there is just one word from everyone and the aim is a text of 6-10 thousand words) up to a billion (e.g. if everyone contributes a page or so and the end result is a page or so).

2. Corollary This compression (or condensation) factor of about a million to a billion from fully participative information collection to a human readable report or article cannot be avoided, unless the information from most people is ignored or not collected, or if the end reader is assumed to gain superhuman powers.

No matter how (politically) free or participative any information or news source claims to be in providing human readable reports of reality (including socio-political reality), it must necessarily carry out this enormous scale of compression. So the question is not if compression is to be carried out, it is how.

In the case of compression by use of a search engine on the web (such as google), the compression is in some sense carried out differently by every reader, but it is still a form of compression. The choice to go to a wikipedia article on a given subject is a similar method of compression, by which the reader trusts in a community of other people to have collectively compressed the information in a way which s/he considers acceptable or valid in some way as a front page to that subject. In both cases, the reader retains a number of links to a related set of information, typically (e.g.) ten times larger, which in turn links to more related information, so the compression is less constraining than in the case of conventional, printed government or corporate books or newspapers.

Whatever the method of compression, this extremely high factor of compression (from a million to a billion) necessarily implies a very high risk of censorship of important information, especially if the Enlightenment goals of respect of human rights for everybody are part of the goals of the information distribution system. Using the terminology of memes, the removing or hiding of memes which oppose the interests of the government or the market economy can easily happen during this compression.

It is probably uncontroversial to state that when doing this compression, both authoritarian and democratic government controlled media and other information systems necessarily filter out a lot of information which ordinary citizens feel should not be hidden.

Less well known is the empirical, quantitative research in the USA by Herman and Chomsky in which they show that the compression method of the corporate media typically involves several filters, which, unsurprisingly, tend to filter against information which is against the interest of the market economy.

Given likely technological developments, it is feasible that within a few decades, literally everyone could have practical access to the wikisphere (wikis and especially wikimedia wikis). There will necessarily be some forms of compression - e.g. to decide what goes on the wikipedia front pages of the two most used languages (Chinese, English). Probably the general culture of the wikisphere makes the relatively crude (compared to the wikisphere) filters of government filters (orders from above or more subtle disapproval by superiors) and corporate filters (risk of losing advertising revenue, need to quickly produce articles at low cost) less likely to occur in the wikisphere (except maybe in work-related, non-anonymous wikis).

In any case, the practically unlimited (at least, for text written by human beings) capacity of the wikisphere together with its positive feedback and negative feedback mechanisms allows and encourages compression methods very different to those of government and corporate filters. (As well, the compression usually only adds to the amount of information available (apart from article deletions), the conscientious or sceptical reader can read in-depth the work which went into constructing the article - by going to the history and discussion pages).

So does the compression in the wikisphere get anywhere near the possibility of compressing information in a way in which everybody's participation is considered in some neutral way? The wikipedia concept of neutral point of view (NPOV) clearly is intended to provide a mechanism in this direction. It should prevent certain negative feedback mechanisms (e.g. filters to hide information which is embarrassing for the dominant political powers at a given time/place) from hiding certain pieces of information and itself provides a positive feedback mechanism to article(s) on the topic since contributors are forced to include a wider range of points of view on what happened, what the truth is, etc. However, it can also operate as a negative feedback mechanism, against the spread of memes which cannot be traced back to any external source ; or in other words, it should limit the spread of rumours (which will remain locally stabilised in the history and discussion pages, rather than spreading through the article itself).

Here is an attempt to define this question in a way which may possibly be measurable:

3. Question Could it be possible that use of the NPOV concept plus other various positive feedback and negative feedback mechanisms in the wikisphere enable dissident (but correct) points of view to survive in the wikiesphere despite the compression factor of somewhere between (potentially) a million and a billion?

Is it possible to answer this question with some level of objectivity, preferably quantitatively? An obvious problem is that the point of view (POV) of any individual researcher (or group of researchers) on which dissident POVs are correct is, by definition, unlikely to be widely accepted by consensus - since if there were consensus, then these POVs would not be dissident POVs.

Method[edit]

Caveat: entry of raw observations into the system[edit]

Finding quantitative answers to these questions is clearly a very large task. A few caveats regarding the present proposal include the following.

Although NPOV and other feedback mechanisms may (as believed by many wikipedians) create a much more neutral method of compressing information than other methods, it is impossible for the selection system to be genuinely neutral (in the sense of giving a picture of potentially observable reality - what people see and observe) if the entry of raw (first-hand) observations into the system is strongly biased.

To take an extreme (but historical) example, if nobody reports the information that thousands of Jews are being put into and exterminated in concentration camps, then no amount of NPOV balancing of different points of view is going to reveal this information, since noone (who participates in the system) has access to the information.

On the other hand, the hope would be that if at least one person reports this first-hand information, then the feedback processes will lead to verification and then amplifying of the information, despite the fact that it initially is considered absurd, unrealistic information. Alternatively, despite the vastly increased rate, quantity and diversity of information flow in Europe today compared to the situation of Europe in the late 1930's, could it be possible that the correct information provided by a single observer would be dampened by negative feedbacks, by people too conservative to believe the reality? This is the question to which it is hoped to provide some sort of quantitative answer, or at least a plan towards providing quantitative answers.

Nevertheless, the caveat remains: at least one person must provide the raw observations - and given the no original research policy of the wikipedia, this would have to enter the system, for example, via wikinews or via external websites. It is obvious that the minimum threshold for an observation to enter the system in Europe of 2005 is much lower than for Europe of the late 1930's - but this threshold clearly still exists, both in Europe and on a world-wide scale.

Models of meme distribution[edit]

Key elements modelling the distribution mechanisms (warning: the word distribution is used in this paper in a dynamical sense, which differs from its use in statistics):

The wikisphere as an alternative meme distribution model[edit]

The wikisphere is considered here to include wikis in general, and wikimedia wikis in particular. Raw observations (first-hand reports) will take different paths through various wikimedia wiki articles - some sort of mathematical model of typical paths could be useful for understanding them.

Some positive feedbacks in wikidom include:

  • (positive feedbacks which are generally felt to have a positive role, i.e. for going towards "truth") NPOV on wiki enables sensitivity to minorities with convincing arguments so that these exponentially grow despite being dissident points of view
  • (positive feedbacks which can often be felt to be something negative) conflicts can grow due to positive feedback - successively stronger attacks grow exponentially until either the system breaks down or a negative feedback becomes important to dampen it.

Some negative feedbacks in wikidom

  • NPOV provides a negative feedback loop to dampen edit wars

If these hypothesised positive/negative feedbacks really exist in wikidom (as would intuitively be expected), then these should be measurable from the statistics.

Can this research itself be NPOV?[edit]

Is there an intrinsic methodical difficulty in this type of research, in that the researcher decides, according to his/her own POV, which dissident POVs are correct (or wrong) and are amplified (correctly or wrongly) by positive feedbacks, and which are conspiracy theories (or plain rubbish) and correctly (or wrongly) dampened by negative feedbacks? Possibly a reasonably objective measure such as absolute or relative numbers of people killed/affected by the memes could help here. (Cf Herman and Chomsky who compared absolute and relative numbers of people killed in Cambodia and East Timor and who killed them, in their analysis.)

Possible memes with measurable evolution and pos/neg feedbacks

Open research principles: GPL licensed software[edit]

In order to be consistent with the spirit of free knowledge and education, the full software for carrying out these analyses should be made available under free licenses such as the GPL. Carrying out traditional research, using closed source or other non-free software, would be inconsistent with the wiki ethos.

Related projects at wikimania 2005[edit]

Note that there are several related projects presented at this meeting, such as:

Data[edit]

The raw data are available at:

Results[edit]

This project is at present only a research plan. At the wikimania '2005 conference, some people felt that quantitative measurement of this sort is too difficult and/or would be meaningless in practice, while others felt that this approach is a good idea.

Conclusions[edit]

It is hoped that in the future, some quantitative answers will be provided to the question Are the hypothesised feedbacks consistent with the empirical dynamics of the data? Some of the ideas for planning this research have been presented.