Grants:Programs/Wikimedia Research Fund/Automatic generation of short biographies using curriculum information registered on the Lattes Platform/Wikibase

From Meta, a Wikimedia project coordination wiki
statusnot funded
Automatic generation of short biographies using curriculum information registered on the Lattes Platform/Wikibase
start and end datesJuly 2023 - July 2024
budget (USD)48,100 USD
fiscal year2022-23
applicant(s)• Washington Segundo, Jesús Mena-Chalco and Veronica Santos



Washington Segundo, Jesús Mena-Chalco and Veronica Santos

Affiliation or grant type



Washington Segundo, Jesús Mena-Chalco and Veronica Santos

Wikimedia username(s)

wtonribeiro; jmenac; VeronicaSantosIBICT

Project title

Automatic generation of short biographies using curriculum information registered on the Lattes Platform/Wikibase

Research proposal[edit]


Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.

The main objective of this project is the automatic creation of short biographies (textual information about academic activities) based on academic information extracted from the Lattes Platform. In this context, we will create a public repository of Brazilian researchers and research groups in order to facilitate the discovery of experts based on their research assets and a concise and well-formulated short bio.

The deliverables of this project will be: i) a public Wikibase repository containing resource assets -bibliometric perspective- of Brazilian researchers and research groups; ii) a set of rules to build an effective research short bio; and iii) a service that generates a concise and well formulated short bio for researches and research groups. An application to enable researchers and research groups to update their data in the Wikibase repository based on their current Lattes Platform data.

The project will be divided into three phases:

1) Lattes Platform and Wikibase integration.

1a) Deploy Wikibase public repository.

1b) Model Lattes Platform data as a Scholarly Knowledge Graph.

1c) Load researchers and research groups records extracted from Lattes Platform in Wikibase.

1d) Develop an application to enable researchers and research groups to update their data based on their current Lattes Platform data.

2) Short Bio rules (with the specialists' knowledge, we will model the academic/scientific information in the form of rules for constructing sentences that present the actor in a simplified way.).

2a) Interview CV analysts to identify the essential information necessary to build a concise and well-formulated short bio.

2b) Design rules that capture this information from research assets.

3) Short Bio generation service.

3a) Build a service that automatically generates short bio to help researchers and research groups to disseminate their work and to be easily discovered.


  • 3 engineers - To be defined
  • 2 specialists - To be defined


Approximate amount requested in USD.

48,100 USD

Budget Description

Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).

Salary or stipend - 3 engineers working for 12 months ($900 per month) + 3 specialists working for 2 months ($700 per month) = $36,600

Equipment - 3 high-performance laptops = $4,500

Open access publishing costs - APC for publishing 1 journal article = $1,000

Cloud Service - $500,00 per month in services of Data Transfer, Storage and Processing = $6,000




Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.


  • 1 A new way to disseminate Brazilian researchers' work through a Knowledge Graph
  • 2 Integration with Lattes Platform
  • 5 IBICT will collect and bulk-load research assets while researchers and research groups will maintain their personal data
  • 6 IBICT will train researchers and research groups on how to maintain their own data
  • 8 Increase Brazilian researchers and research groups' findability as well as identify research gaps among the Brazilian scientific community
  • 9 Increase Open Science practice


  • 4 Enhance inclusive and quality education for postgraduate student
  • 9 Foster innovation from Brazilian researches
  • 10 Reduce inequality within Brazilian researchers
  • 17 Increase Brazilian researches participation


Plans for dissemination.

Present results in Wikimedia Research-related venues and scholarly conferences such as the International Society for Informetrics and Scientometrics (ISSI) conference. Publish source codes openly in GitHub. Participate in Wikimedia Conferences to disseminate the outcomes of our work to the Wikimedia Community. Wikibase should be used to disseminate Brazilian research assets such as Researchers and their Publications, Patents, Projects, Groups, and Institutions, as a Knowledge Graph.

Past Contributions[edit]

Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.

IBICT maintains the Brazilian Portal of Publications and Scientific Data (Oasisbr) and the Brazilian Digital Library of Theses and Dissertations (BDTD). IBICT is also responsible for BrCris (Brazilian Research Ecosystem) which integrates Oasisbr and BDTD with information about research assets collected from national data sources, such as Lattes Platform, Sucupira Platform, in addition to international ones such as OpenAire, Wikidata and CrossRef. Through Lattes Platform extractor, IBICT can integrate Lattes Platform data with other information systems, with the aim of generating scientific and technological production indicators as well as carrying out studies using data mining and machine learning to provide aggregated services.

I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.