Research:Wikimedia France Research Award/nominated papers

Other languages:

English · français

Vote process[edit]

Each Wikimedian can vote for two papers max. End of the voting period March 11^th, 2013.
After the closing of the poll, votes are tallied and papers are ranked (number one being the paper with the most votes…).
The Condorcet method is used to find the winner, with a condorcet with the jury ranking and the Wikimedian’s ranking. If there is a draw, the Wikimedian’s ranking prevails.

Studying Cooperation and Conflict between Authors with history flow Visualisations[edit]

Studying Cooperation and Conflict between Authors with history flow Visualizations is a quantitative, data vizualization oriented paper. It belongs to the first-wave works that provided foundations for much of the later work on Wikipedia.

See the full text here.

Summary[edit]

Fernanda Viegas, Marin Wattenberg, and Kushal Dave describe a visualization system they have built called history flow that they use to visualize changes made to Wikipedia articles. The authors suggest that their papers makes three distinct contributions:

History flow itself which is able to reveal editing patterns in Wikipedia and provide context for editors. Several examples of collaboration patterns that become visible using the visualization tool and contribute to the literature on Wikipedia. Implications of these patterns for design and governance of online social spaces.

The paper is largely an examination of Wikipedia and the early parts of the paper give background into the sites. It uses shortcomings in the design of the Wikipedia to motivate the history flow visualization which essentially depicts articles, over time, with colors representing authors who contributed text in question. The interface is particularly good at representing major deletions and insertions.

The authors use a lightweight statistical analysis to reveal patterns of editing on Wikipedia. In particular, they show vandalism including mass-deletion, the creation of phony redirects, and addition of idiosyncratic copy and show that it rarely stays on the site for more than few minutes before being removed.

They also show a zig-zag pattern that represents negotiation of content, often in the form of edit wars. They also attempt to provide some basic data on the stability of Wikipedia and the growth of articles on average. They suggest something that is now taken for granted by researchers of wikis: that studying Wikipedia may have important implications for other types of work.

Most of this summary originates in the one posted acawiki.org.

Jury comments[edit]

The paper is important more for its path-breaking work on Wikipedia — now with its track at CHI, than for the history flow visualization which has not, for the most part, been widely deployed outside Wikipedia but which seems to hold promise in a variety of other contexts. The paper has been cited more than 400 times, mostly in the academic literature on Wikipedia.

Vote for this paper

Vote[edit]

Zakiakhmad (talk) 09:37, 21 February 2013 (UTC)
Bokken (talk) 10:50, 25 February 2013 (UTC)
Benoilid (talk) 10:12, 21 March 2013 (UTC)

DBpedia: A Nucleus for a Web of Open Data[edit]

DBpedia: A Nucleus for a Web of Open Data was proposed in 2007 by Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives.

You can also watch a video of the paper presentation here.

See the full text here.

Summary[edit]

DBpedia (see blog here) is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The English version of the DBpedia knowledge base currently describes 3.77 million things, out of which 2.35 million are classified in a consistent Ontology, but there are also localized versions of DBpedia in 111 languages.

DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. The paper describes the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. Then, some emerging applications from the DBpedia community are presented and it is shown how website authors can facilitate DBpedia content within their sites. Finally, the current status of interlinking DBpedia with other open datasets on the Web is presented and as well as how DBpedia could serve as a nucleus for an emerging Web of open data.

Jury comments[edit]

DBpedia: A Nucleus for a Web of Open Data is the paper with the most citations on the Wikimedia Award shortlist, and an opening to discuss the future of wikidata and semantic data, a major issue for the year to come. Since 2007, it has stood the test of time with many developments like Wikidata, Sémanticpédia, a let of related litterature.

It’s less about analysing Wikipedia than tapping into its resources in order to build a collaborative knowledge base (at the very least). It’s widely used so we must decide what is judged here : the result of that project (that is central to the Linked Data Cloud), its influence, or the paper itself?

Though they fulfill different tasks, DBpedia and the Wikidata project have a lot in common. There is also a similar french DBpedia project : Sémanticpédia - http://lab.wikimedia.fr

It’s a description of the steps that lead towards the creation of a dataset generated from Wikipedia. Then, once this dataset has been properly produced, nothing prevents anyone from using it to get a better understanding of Wikipedia (this is what happened with the French version though the organizational process was very different from what is described here, WMF being a partner from the start).

Petermr (talk) 16:49, 18 February 2013 (UTC)

Vote for this paper

Vote[edit]

Taha Yasseri (talk) 08:44, 18 February 2013 (UTC)
Bien sûr puisque sur la wikipédia francophone des contributeurs ont réfléchi à la question depuis longue date. GLec (talk) 16:12, 18 February 2013 (UTC)
Karima Rafes (talk) 08:36, 22 February 2013 (UTC)
--Rudloff (talk) 11:47, 26 February 2013 (UTC)
Johnbreslin (talk) 15:36, 6 March 2013 (UTC)
Asaf Bartov (WMF Grants) talk 00:26, 8 March 2013 (UTC) superb contribution that keeps on giving. Validated even further by Wikidata.
DBpedia is a valuable resource and especially interesting in relation to the recent development of Wikidata. Finn Årup Nielsen (fnielsen) (talk) 14:01, 11 March 2013 (UTC)

A Content-Driven Reputation System for the Wikipedia[edit]

A Content-Driven Reputation System for the Wikipedia by Thomas Adler and Luca de Alfaro, was published in the Proceedings of the 16th International World Wide Web Conference, 2007

See the full text here.

Summary[edit]

The paper presents a reputation system for Wikipedia authors.

Most reputation systems, such as the ones used in social networks or e-commerce, rely on user-to-user comments or other ratings. In the proposed reputation system for the Wikipedia, reputation is content-driven: authors gain reputation when the edits they perform to Wikipedia articles are preserved by subsequent authors. Conversely, authors lose reputation when their edits are rolled back or quickly undone. Thus, author reputation is computed on the basis of content evolution only: in particular, no “badmouthing” or commissioned praise is possible.

Content change is computed either as Text life (whether the text is deleted or not) or Edit life (how much of the article structure is kept after the next edit).

The author reputation can be used to flag new contributions from low-reputation authors, or to allow only authors with high reputation to contribute to controversial or critical pages. A reputation system for Wikipedia could also provide an incentive for high-quality contributions.

The authors also implement the proposed system, and use it to analyze the entire Italian and French Wikipedias during their first years (totalling 691,551 pages and 5,587,523 revisions), with results showing that the proposed notion of reputation has good predictive value. Machine-calculated results are put to the test by a group of 7 volunteers rating revisions performed to the Italian Wikipedia.

Anonymous authors are shown to be the largest source of short-lived contributions.
Changes performed by low-reputation authors have a signiﬁcantly larger probability of having poor quality of being later undone.
Comparison with Edit-Count Reputation (the more edits you have the better your reputation) shows that content-driven reputation performs slightly better than edit-count reputation.
Author reputation is also a useful factor in predicting the survival probability of fresh text.

Jury comments[edit]

A major contributor to the lasting debate on article quality, still a hot topic. Thomas Adler and Luca de Alfaro have continued working on this topic since 2007 and refined their method.

Vote for this paper

Vote[edit]

Even though the resulting Wikitrust software did not become as widely used as once anticipated, it has been an innovative and influential approach. This appears to have been the first academic paper making use of a content persistence metric as quantitative indicator for content quality. Tbayer (WMF) (talk) 22:47, 10 March 2013 (UTC)
Warfair (talk) 01:32, 19 March 2013 (UTC)

Creating, destroying, and restoring value in Wikipedia[edit]

Creating, destroying, and restoring value in Wikipedia is a groundbreaking, classic paper published at the ACM Conference on Supporting Group Work (GROUP) in 2007.

See the full text here.

Summary[edit]

This paper heralds a quantitative approach to measuring impact of an edit. The six authors (Reid Priedhorsky, Jilin Chen, Shyong (Tony) K. Lam, Katherine Panciera, Loren Terveen, and John Riedl) worked at the GroupLens research lab at the University of Minnesota.

They suggest quantifying the impact of a given edit by the number of times the edited version is viewed. The concept of persistent word view (PWV) is builds on the notion of an article view: each time an article is viewed, each of its words is also viewed. When a word written by editor X is viewed, he or she is credited with one PWV.

They use a series of datasets spanning four years, analyzing 4.2 million editors and 58 million edits, with results highlight the importance of frequent editors, who dominate what people see when they visit Wikipedia, and show that this domination is increasing. For example, the top 10% of editors by number of edits contributed 86% of the PWV, and top 0.1% contributed 44%.

They implement a vandalism-detecting metric using only the comments associated to the reverts. The metric is implemented through a sophisticated automated / human research protocol, with results showing the rapidity of damage repair (42% of damage incidents are repaired immediately, with 0.75% of incidents persist beyond 1000 views). Interestingly, they take into account not only how long articles remained in a damaged state, but also how many times they were viewed while in this state. While the overall impact of damage in Wikipedia is low, they show it is rising. The appearance of vandalism-repair bots in early 2006 seeming to have halted the exponential growth.

Based on previous papers, like the 2004 "Studying cooperation and conflict between authors with history flow visualizations" (also a nominee for this award) they use his categories of damages to articles (nonsensical, offensive, false content…), but also correct and add some of their own (Misinformation, Partial delete, Spam). Using human judgement, they show most for the damage belongs to the “nonsense” category.

Jury comments[edit]

seminal ideas, quantitative approach, a lot of content

Vote for this paper

Vote[edit]

Very useful measure of edit impact. Avenue (talk) 00:12, 25 February 2013 (UTC)
Interesting work --PierreSelim (talk) 13:29, 26 February 2013 (UTC)
Ypnypn (talk) 13:55, 6 March 2013 (UTC)
Most definitely; the D_LOOSE and D_STRICT regular expressions alone would win this a prize. Ironholds (talk) 21:22, 9 March 2013 (UTC)
Introduces several broadly relevant new metrics for assessing edit impact & article quality; shows the importance of Wikipedia's core editor base in creating and maintaining the value of the encyclopedia, refuting the commonsense notion that Wikipedia is somehow a product of a nameless, faceless "crowd". And exposes both the ways in which Wikipedia and open wikis in general are vulnerable to quality degradation, and the robust mechanisms that exist to combat it. Best of the five! Disclosure: I have co-authored a research paper with the last author. Jtmorgan (talk) 21:27, 10 March 2013 (UTC)
Tbayer (WMF) (talk) 22:31, 10 March 2013 (UTC)
"Persistent word view" is a interesting idea. Finn Årup Nielsen (fnielsen) (talk) 13:57, 11 March 2013 (UTC)

Can history be open source? Wikipedia and the future of the past[edit]

Can history be open source? Wikipedia and the future of the past. By Roy Rosenzweig. Published in the Journal of American History, in 2006.

See the full text here or the HTML version with documents in full color.

Summary[edit]

Roy Rosenzweig was a history professor at George Mason University presented a paper on Wikipedia from the perspective of a historian : "Can History be Open Source? Wikipedia and the Future of the Past" as a historian's analysis complements the discussion from the important but different lens of journalists and scientists.

Rosenzweig focuses on, not just factual accuracy, but also the quality of prose and the historical context of entry subjects. He begins with in depth overview of how Wikipedia was created by Jimmy Wales and Larry Sanger and describes their previous attempts to create a free online encyclopedia. Wales and Sanger's first attempt at a vetted resource, called Nupedia, sheds light on how from the very beginning of the project, vetting and reliability of authorship were at the forefront of the creators.

Rosenzweig adds to a growing body of research trying to determine the accuracy of Wikipedia, in his comparative analysis of it with other online history references, along similar lines of the Nature study. He compares entries in Wikipedia with Microsoft's online resource Encarta and American National Biography Online (ANBO). Where Encarta is for a mass audience, American National Biography Online is a more specialized history resource. Rosenzweig takes a sample of 52 entries from the 18,000 found in ANBO and compares them with entries in Encarta and Wikipedia. In coverage, Wikipedia contain more of from the sample than Encarta. Although the length of the articles didn't reach the level of ANBO, Wikipedia articles were more lengthy than the entries than Encarta. Further, in terms of accuracy, Wikipedia and Encarta seem basically on par with each other, which confirms a similar conclusion that the Nature study reached in its comparison of Wikipedia and the Encyclopedia Britannica.

Then, Rosenzweig discusses the effect of collaborative writing in more qualitative ways. He notes that collaborative writing often leads to less compelling prose. Multiple styles of writing, competing interests and motivations, varying levels of writing ability are all factors in the quality of a written text. Wikipedia entries may be for the most part factually correct, but are often not that well written or historically relevant in terms of what receives emphasis. Due to piecemeal authorship, the articles often miss out on adding coherency to the larger historical conversation. ANBO has well crafted entries, however, they are often authored by well known historians.

However, the quality of writing needs to be balanced with accessibility. ANBO is subscription based, whereas Wikipedia is free, which reveals how access to a resource plays a role in its purpose. As a product of the amateur historian, Rosenzweig comments upon the tension created when professional historians engage with Wikipedia. He notes that it tends to be full of interesting trivia, but the seasoned historian will question its historic significance. As well, the professional historian has great concern for citation and sourcing references, which is not as rigorously enforced in Wikipedia.

Because of Wikipedia's widespread and growing use, it challenges the authority of the professional historian, and therefore cannot be ignored. The tension raises questions about the professional historians obligation to Wikipedia. To this point, Rosenzweig notes there is an obligation and need to provide the public with quality information in Wikipedia or some other venue.

Rosenzweig concludes by looking forward and describes what the professional historian can learn from open collaborative production models. Further, he notes interesting possibilities such as the collaborative open source textbook as well as challenges such as how to properly cite collaborative efforts.

Most of this summary originates in the one posted by ray cha.

Jury comments[edit]

Thought Paper/essay that contrast with classic scientific articles but a very stimulating read.

Rosenzweig was a pioneer in digital history, incorporating new digital media and technology with history to explore new possibilities to reach a larger and diverse public audience.

Comments by Glyn Moody.

The essay is long, but it is well-worth reading all the way through its detailed comparison of Wikipedia and conventional reference works. One of its shrewdest observations is the following:

Overall, writing is the Achilles’ heel of Wikipedia. Committees rarely write well, and Wikipedia entries often have a choppy quality that results from the stringing together of sentences or paragraphs written by different people. Some Wikipedians contribute their services as editors and polish the prose of different articles. But they seem less numerous than other types of volunteers. Few truly gifted writers volunteer for Wikipedia.

Vote for this paper

--Alouache (talk) 09:18, 23 March 2013 (UTC)=== Vote ===

This is an insightful well-written analysis of Wikipedia epistemology. Definitly a classic of Wiki Studies. Alexander Doria (talk) 17:33, 18 February 2013 (UTC)
Per Alexander Doria. Gentil Hibou (talk) 09:15, 19 February 2013 (UTC)
Classic. Shame Roy is no longer with us. --Piotrus (talk) 18:29, 19 February 2013 (UTC)
za (talk) 09:45, 21 February 2013 (UTC)
Bokken (talk) 10:51, 25 February 2013 (UTC)
Useful, anyway. --Ambre Troizat (talk) 15:17, 25 February 2013 (UTC)
A brilliant paper. Peter Damian (talk) 18:49, 25 February 2013 (UTC)
Rudloff (talk) 11:48, 26 February 2013 (UTC)
Per Alexander Doria --PierreSelim (talk) 13:35, 26 February 2013 (UTC)
Per Alexander Doria - wish many more readers to have a look to it. --Ttzavaras (talk) 21:02, 26 February 2013 (UTC)
Love this one. A Wikipediaology must-read! --Dimi z (talk) 17:59, 4 March 2013 (UTC)
--Charles Andrès (WMCH) 11:11, 6 March 2013 (UTC)
Sfauqueur (talk) 14:08, 6 March 2013 (UTC)
Ypnypn (talk) 13:57, 6 March 2013 (UTC)
This paper is fundamental to Wikipedia history - Rosenzweig was the first historian to understand the significance of what we do, and this paper gave legimacy to the idea of researching WP in the humanities. My own thesis about the historiography of Wikipedia [1] was only approved as a topic because I was able to point to Rosenzweig's paper. Wittylama (talk) 00:53, 8 March 2013 (UTC)
--Sgatoux (talk) 09:21, 19 March 2013 (UTC)