Jump to content

Research:Contribution Domains and Data Importance

From Meta, a Wikimedia project coordination wiki
19:01, 14 May 2018 (UTC)
Duration:  June-2018 – January-2019

This page is an incomplete draft of a research project.
Information is incomplete and is likely to change substantially before the project starts.

We seek to understand where contributors in the peer production community, Wikidata, put their efforts and how that relates with their understanding of content usage. Additionally, we seek to understand whether contributors have a sense for where content demand is.


We will be performing semi-structured interviews (approximately 20) with Wikidata contributors. We will recruit participants by contacting them via community mailing lists, and community discussion pages. We would like some of our participants to be prominent or prolific contributors who have produced editing tools and/or other forms of automated means of contributing. In those cases, we may directly email participants. Ideally, we would like our participants to be evenly split between three categories. First, we would like to talk to contributors who create bots. Bots provide automated data imports and editing in Wikidata. Second, we would like to talk to contributors who create semi-automated tools. These tools are less automated than bots but still help automate contributions. Third, we would like to talk to contributors who do not create bots or tools.

The interviews themselves will be semi-structured and consist of two parts. The first part seeks to better understand the domains/topics contributors put their efforts in, why they contribute in those domains/topics, and whether contributors care about providing content that receives the most use. To do so, we ask a series of questions. The second part seeks to understand whether contributors can recognize the popularity/importance of content. To do so, we will present each contributor with 10 concepts which represent Wikidata content. For each concept, we will ask contributors what the popularity/importance to others is on a scale of 1 to 5. We will also ask contributors why they gave the ratings that they gave.

Once interviews are completed, we will compute descriptive statistics on our part 2 results to understand the relationship between actual content importance and our participants’ understanding of importance. We will also qualitatively analyze the reminder of our results including participants’ responses for part 1 and responses from part 2 related to why contributors provided a given importance rating. In both parts 1 and 2, we may also perform basic descriptive statistics other than those previously mentioned.


June 2018 - October 2018: Recruit interview participants

September 2018 - November 2018: Perform interviews

October 2018 - January 2019: Analyze data and publish results

Policy, Ethics and Human Subjects Research[edit]

We are in the process of submitting our study proposal to the IRB at the University of Minnesota. We will provide the reference and approval data at a later time.


Once your study completes, describe the results an their implications here. Don't forget to make status=complete above when you are done.