Research:Wikipedia and Science

From Meta, a Wikimedia project coordination wiki
Kris Gulati
Duration:  2022-June – ??

Invalid status "Draft" provided

The Lead: Introduce and describe your project at a high level in one or two paragraphs. Will the output of this project provide tangible benefits for our community (in the form of data, software, Web services)? If the output of this project mainly consists of scholarly publications, what aspects of Wikimedia projects will they help to understand


Describe in this section the methods you'll be using to conduct your research. If the project involves recruiting Wikimedia/Wikipedia editors for a survey or interview, please describe the suggested recruitment method and the size of the sample. Please include links to consent forms, survey/interview questions and user-interface mock-ups.


Please provide in this section a short timeline with the main milestones and deliverables (if any) for this project.

First three months:

Investigating research question 1 in the proposal. Prior to conducting the RCT, we wanted to do some initial exploration into the project before embarking on an expensive RCT, to ensure that we're not wasting resources. To this end, we did some very exploratory examinations of whether existing Wikipedia profiles have differences in citations compared to a control group based on observables. These initial results haven't been completely convincing and so at the moment we're exploring the other research questions in the project or a potential pivot. Additionally, some preliminary feedback from colleagues said the paper was slightly too close intellectually to the Thompson and Hanley (working) paper.

The other main research question we looked into was whether Wikipedia could shape patents in foreign languages and so accelerating the diffusion of knowledge. However, this also seemed problematic for a couple of reasons:

Firstly, the RCT would probably have to be incredibly large to observe an effect. Secondly, there is a lot of noise in the process and so the proposed study didn't seem effective. More specifically, like Thompson and Hanley we would have to use NLP for text similarity to get a good measure of knowledge diffusion which would be computationally expensive.

Largely, I think there are better methods of approaching this second research question.

Given these constraints, I think it's better to pursue an alternative research question that emerged from this exploratory process in the next Wikimedia funding round.

Policy, Ethics and Human Subjects Research[edit]

It's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.


Describe the results and their implications here. We encourage you to share preliminary data. Don't forget to make status=complete above when you are done.


Provide links to presentations, blog posts, or other ways in which you disseminate your work.