Computational Linguistic Methods Supporting Systematic Literature Reviews for the building up of scientific−based Wikipedia articles and Systematic Reviews.
start and end datesTo start in July of 2022 in order to complete all results by June 2023.
budget (USD)20,000-29,999 USD
Applicant's Wikimedia username. If one is not provided, then the applicant's name will be provided for community review.
- Computational Linguistic Methods Supporting Systematic Literature Reviews for the building up of scientific−based Wikipedia articles and Systematic Reviews.
Entity Receiving Funds
Provide the name of the individual or organization that would receive the funds.
- Fernando Pinheiro Andutta, Postdoctoral at FFLCH at University of University of São Paulo, Brazil.
Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.
- Vast literature on various scientific research topics is available on the internet nowadays. However, to identify relevant literature related to a chosen research topic can be difficult. Over 2.5 million scientific publications are produced every single year across all knowledge domains. In this scenario, computer-based techniques dealing with Natural Language Processing seem quite promising for the building up and continuous update of Wikipedia articles that are largely based on science (https://en.wikipedia.org/wiki/Science_information_on_Wikipedia).
- The writing up of scientific-based Wikipedia article requires some literature review, which can be very similar to writing a Systematic Review (SR) to a journal. SRs are a set of techniques that allows for highly reproducible outcomes when gathering and summarising scientific papers, so that two humans unknown to each other though carrying out a SR over a similar topic should obtain a cluster of selected articles very similar to each other for their final review. And this process of building a SR aligns well with the writing of Wikipedia articles focused on scientific contents. The step of curating articles requires significant improvements over currently available methods, because each article finally selected in SRs are mostly human curated. In this project, we aim at further developing Natural Language Processing (NLP) tools for for supporting reviews of scientific papers and related data. We aim to provide a NLP software to support Wiki-editors contributing to the writing of scientific-based Wikipedia articles. Furthermore, this work is meant to be available to the scientific community, from which we may obtain motivation towards the writing and update of Wikipedia articles focusing on science. In order to achieve this, an investigation on various models for estimating semantic similarity and relatedness among hundreds or thousands of documents and written-information will be performed, and this process will run alongside a score system process that finds written evidence inside a large cluster of publications in order to trimming down to a much smaller cluster from which similarities can arise from. More details concerning the structure and timeline of this project is readily available at: https://www.overleaf.com/read/sdzdbpjxcqsm
Approximate amount requested in USD.
Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).
- Stipend: 24,000 USD
- 12 months x $2,000 = $24,000 USD
- Equipment: 5,000 USD
- Workstation Dell and peripherals= $3,000 USD
- 2x2TSolidStateDrive(SSD)= $1,000 USD
- Graphics Processing Unit (GPU) = $1,000 USD
- Additional: 0,000 USD
- Publishing costs = $0,000 USD (WikiJournal of Science)
- Conference costs = $0,000 USD (Wikiconferences)
- TOTAL: 29,000 USD
Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.
- This project will provide tools for the curation process of publications used in scientific-based Wikipedia articles.
- Having said that, this project will mostly operate over three important components of the WMF, that are:
- Wikipedia – improving the quality of scientific-based articles,
- Wikicite – providing new methods and tools to support Wiki-editors, and to bring more efficiency and bias reduction for the uploading of scientific information into Wikipedia.
- Wikidata – storage of scientific information benefiting from the Resource Description Framework (RDF).
- WikiJournal – supporting advancements towards Living Systematic Reviews that can benefit the WikiJournal, which currently requires increased momentum.
This project largely aligns with the improvement of user experience (item 2 from recommendations).
Plans for dissemination.
- Presentations through wiki-conferences, for example, the upcoming WikiCon2022 in São Paulo (https://br.wikimedia.org/wiki/WikiCon_Brasil_2022).
- Submission of all publications under the wiki-ecosystem through the WikiJournal of Science, which is a journal #free2publish, #free2access, and provides a transparent peer-review system.
- Aggregation of researchers who have already produced Systematic Reviews and wish to re-assess their reviews using these methods and codes. Consequently, these researchers will add strength to the project and potentially expand this project outreach.
- A few researchers collaborating with us, have recently offered to HGAPify the produced method-code, so that their feedback will allow us to further improve the UX.
Additional smaller complementary methods using a number of media platforms.
Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.
- We are mostly researchers, and Andutta has contributed primarily towards science in the area of climate change and oceanography. Recently, Andutta got in contact with people from the wiki-community, and got exposed to the challenges for improving scientific-based Wikipedia articles. We started to collaborate on this project to support wiki-editors and researchers using large quantity of scientific information. Consequently, Wiki-editors can use this work to produce and update scientific-based Wikipedia articles.
I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.
Please add any feedback or endorsements to the grant discussion page only. Any feedback added here may be removed.