Wikicite/grant/Meta-analysis of Wikipedia's coronavirus references and citations

From Meta, a Wikimedia project coordination wiki

WikiCite


Project summary[edit]

Project Name
A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic
Start/End dates
October 2020 - March 2021
Amount requested (and the currency you wish to receive it in)
3,500 USD
Amount requested (in US$ equivalent)
$3500

The people[edit]

Contact person name/Wikimedia username
Omer Benjakob, https://en.wikipedia.org/wiki/User:%D7%A2%D7%95%D7%9E%D7%A8_%D7%91%D7%9F_%D7%99%D7%A2%D7%A7%D7%91
Contact person e-mail address
omerbj@gmail.com; omer.benjakob@haaretz.co.il

[Alternatively, confirm that you have "Allow other users to email me" enabled in your account preferences]

Organisation (optional)

If this grant is for an organisation (for example a Wikimedia Affiliate), name it here

Project participants
Who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.
  • Omer Benjakob, Haaretz journalist and independent Wikipedia researcher
  • Dr. Rona Aviram, Weizmann Institute of Science, Israel
  • Dr. Jonathan Sobel, Technion, Israel

The project[edit]

Description[edit]

Describe the project or event.

Our project is an academic research study focused on Wikipedia, coronavirus, scientific research and the citations that link them together. This study focuses on the scientific backbone behind the Covid-19 related content on English Wikipedia to ask: What role did science play in supporting the coronavirus articles and what can the citations used tell us.

Using citations as our readout, we created a corpus of all the references used on the over 3,000 coronavirus articles, to ask which sources informed Wikipedia’s coronavirus content, and how was the scientific research on COVID-19 represented and utilized on Wikipedia. Using the citations, we characterized the most trusted sources for scientific media and popular media, to gauge what role science played. In addition, we analysed the quality of the sources using citation metrics within Wikipedia and in the scientific literature, as well as their impact on social media. We also investigated the role preprints and open access peer-reviewed studies played, a key and important issue that has yet to be researched in regards to the virus, when a surge in coronavirus pre-prints were uploaded online. Furthermore, based on the citations, we created a metric to assess the scientificness of a Wikipedia article based on its academic references.

In order to have a historical perspective we built a timeline of Wikipedia articles related to COVID-19. Working through a temporal axis revealed that alongside a massive growth in the number of articles on coronavirus, there were shifts in quality and scientificness over time. In addition, we looked at the latency of scientific citation in Wikipedia (latency = the Wikipedia insertion date of a scientific articless versus its publication date in an academic journal).

Last, to investigate the role of Wikipedia's scientific sources (peer-reviewed papers and preprints), we built a network of Wikipedia articles linked together based on their DOI sources. Our network analysis allowed us to map how scientific knowledge related to coronavirus played a role not just in specific articles created during or prior to the pandemic, but actually formed a web of knowledge that proved to be an integral part of Wikipedia's scientific infrastructure.

In sum, our work provides a detailed, quantitative overview of the pandemic related knowledge in Wikipedia. Among our initial findings are: 1) that Wikipedia managed to maintain high-academic sources throughput the pandemic, 2) that opening access to peer-reviewed articles usually behind a paywall paid off and these were disproportionately cited in regards to coronavirus, and 3) that despite a deluge of pre-prints related to the virus and despite the wide coverage these enjoyed in the popular media, on Wikipedia, non-peer-reviewed academic sources did not play a key role, allowing the coronavirus articles to maintain the high standards laid out by the WikiProject Medicine.


Motivation[edit]

Why is this project needed? What will it solve or improve?

Understanding how scientific and medical information was integrated into Wikipedia overtime, and what were the different sources that informed the COVID-19 content, is key to understanding the digital knowledge echosphere during the pandemic. Much funds and energy are being invested by institutions of science in addressing the issues raised by the coronavirus - this includes speeding up peer-review processes, but also opening access to existing research and providing up-to-date knowledge to fight the plague of disinformation which, per the UN and the WHO, has turned the pandemic into an infodemic as well. However, despite these efforts, it remains unclear how much of this science is reaching the public.

Our study is expected to shed light exactly on this and try to characterize and understand the role of scientific media in Wikipedia during the pandemic. Understanding how science reaches the public - with Wikipedia serving as a key node in this process - could prove key for scientists, public health officials and even policy makers. Moreover, due to the large role of non-scientific media in the coronavirus articles, we also seek to put forth a method for gauging the quality and scientificness of articles. This method, based solely on the different citations, will allow more research into Wikipedia’s ties with science and could help bridge the gap between the scientific community and the general public. Our temporal analysis of the growth of the citations over time, which includes a focus on those citations that were deleted, could also go a long way in terms of clarifying the cumulative growth of knowledge over time, in this case during the coronavirus pandemic.

From a broad perspective, Wikipedia’s readership, especially in English, can benefit from understanding the forces behind the information they read online. There is a growing public interest in Wikipedia, specifically regarding its success in supplying accurate and up-to-date information on coronavirus, and our study can help to further this. Wikipedia is the leading source of online information for millions around the globe, as people's trust in the information they read on Wikipedia grows, it is important to understand to what extent many of the facts offered on the website are based on academic and scientific research.


Activities[edit]

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project?

At the end of the project, first and foremost, we will have a full study offering a meta-analysis of all the coronavirus-related Wikipedia articles in English and their respective corpus of citations. In addition to the study, which we aim to publish in a high impact factor peer-reviewed journal, we will also put forward a number of tools for use by researchers and the general public.

From an academic perspective, we created a metric for a Wikipedia-article’s scientificness. Moreover, with the help of an R package and specifically created tools, we hope to facilitate additional research of this type.

We developed an R package in order to retrieve the history of any Wikipedia article and its content as timestamps, revision IDs, users, size, citations counts, and article text. Wikipedia articles corpus DOIs, PMIDs, ISBNs, websites and URLs were extracted. Hence, this package allows the retrieval of all information related to an article and its citations, in structured tables.

Moreover, it provides several visualization tools for this data: Notably, two navigable visualisation tools are available for any set of Wikipedia articles. The first one is a timeline of an entire category of articles, organized according to article creation dates which allows users to navigate between the timeline and the respective Wikipedia articles. The second is a network tool that links between Wikipedia articles and scientific publications and allows one to map the network of scientific sources used on Wikipedia. Moreover, the package includes the proposed metric to assess the scientific quality of a Wikipedia article. This metric, which we call “Sci Score” is defined by the ratio between the number of scientific journal citations (DOI count) divided by the total number of citations (reference count) of any given article.

For an example of our timeline tool >> https://jsobel1.shinyapps.io/Wiki_covid-19_timeline/

For an example of our network tool >> https://jsobel1.shinyapps.io/interactive_paper_art_network_covid/


Measures of success[edit]

What are criteria you will define success for your project, and how do you intend to measure for them? What are your targets for these measurements?

Notably, having our tools used in additional studies on how other scientific fields are represented based on their respective corpus of citations, would be perhaps the largest metric for success. We have already lined up our next project and hope at minimum to make use of these tools ourselves, but are also in touch with other researchers interested in making use of them.

Moreover, a number of other metrics for our success can be put forward: Firstly, publishing our study in a peer-reviewed journal with a high impact factor. Secondly, having our findings presented in a conference (or conferences) as well as receiving media coverage for it (one of us is a journalist who writes about Wikipedia and works with others writing about similar topics). Dr. Aviram and Mr. Benjakob have previously authored a study on Wikipedia and science (PMID: 29665713) which enjoyed moderate success and managed to inspire no small amount of debate both on Wikipedia and online among academics.

Thirdly, recreating our study once a full year has passed from the pandemic’s outbreak. This will be important both conceptually (in terms of testing out methods) but also analytically as it could provide new hindsight of what happened on Wikipedia before, during and hopefully also after the pandemic.


Community[edit]

Who is your target audience for this project, and how will you ? How will you engage the community you’re aiming to serve at various points during your project?

Our main target audience for this project is Wikipedians focused on health and coronavirus articles, and more broadly those focused on scientific topics on Wikipedia. Our second target audience is of course academics in the fields of scientometrics, the sociology of science and knowledge, and communicating science; as well as those focused on public health. Thirdly, policy makers in both the academia and the public sectors interested in understanding the knowledge ecosystem online today and focused on issues of open science and public health, including those interested, for example, in models for fighting disinformation.

Reaching these audiences will be done both on-Wiki and off-Wiki. On Wikipedia, at least in English we are already in touch with leading editors, and generally we are in touch with other like-minded researchers. For example, we have recently formed a relationship with a team supported by IBM and working to use Wikipedia’s edit history and data to try to understand the different policy responses used in different countries. These, as well as our general ties to the media and the growing interest we’ve seen in our research (in the form of lectures, for example) make us confident not only in our ability to publish this study, but also make sure it reaches those who could make the best use of it.

The Budget[edit]

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table or link to a separate (public) document if needed.

  • Submitting a paper for publication and peer-review on e-Life - $2,000
  • Securing open access status for the final paper - $1,000
  • General maintenance costs for online tool - $500

COVID risk assessment (for in-person events)[edit]

If the project is for an in-person event, you must complete the risk assessment tool and checklist, and provide a link to copies of these documents here. Events must not include any international travel, and must follow all applicable local health guidelines.


Feedback[edit]

Community notification[edit]

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.
Please provide links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions.

  • ...
  • ...

Endorsements[edit]

Optional: Community members are encouraged to endorse your proposal and leave a rationale here.

  • ...
  • ...

Questions[edit]

Any questions about this proposal and feedback from reviewers should be placed on the associated discussion page.

Report[edit]



Status
closed