Jump to content

Wikicite/grant/Adding support of DBLP and OpenCitations to Wikidata

From Meta, a Wikimedia project coordination wiki

Project summary[edit]

Project Name
Adding support of DBLP and OpenCitations to Wikidata
Start/End dates
1 December 2020 - 30 April 2021
Amount requested (and the currency you wish to receive it in)
11130.79 TND
Amount requested (in US$ equivalent)
4000 USD

The people[edit]

Contact person name/Wikimedia username
Mohamed Ali Hadj Taieb (User:Mohamedalihaj)
Contact person e-mail address
Organisation (optional)
University of Sfax, Tunisia
Project participants
Who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.
  • Houcemeddine Turki, Research Assistant, University of Sfax, Tunisia
    • Research scientist in Library and Information Science with publications in Scientometrics and other venues.
    • A long-term Wikimedian familiar with Wikidata API and interface (User:Csisc).
  • Mohamed Ali Hadj Taieb, Assistant Professor, University of Sfax, Tunisia
    • Research scientist in Semantic Technologies and Natural Language Processing with publications in Engineering Applications of Artificial Intelligence and other venues.
    • Experience in conducting a research project related to the use of wikis for the construction of semantic resources.
  • Mohamed Ben Aouicha, Associate Professor, University of Sfax, Tunisia
    • Research scientist in Semantic Technologies and Natural Language Processing with publications in Engineering Applications of Artificial Intelligence and other venues.
    • Experience in conducting a research project related to the use of wikis for the construction of semantic resources.

The project[edit]


Describe the project or event.

The project aims to create two bots to mass import bibliographic information released in DBLP and OpenCitations under CC0 License to Wikidata:

  • DBLP: a computer science bibliography website launched in 1993 at the University of Trier, Germany. It is currently the most complete bibliographic database for computer science research. Its author disambiguation methods are robust as shown at https://link.springer.com/article/10.1007/s11192-018-2824-5 and can be reliably used to add full coverage of computer scientists in Wikidata.
  • OpenCitations: an open science project trying to publish free bibliographic citation information in RDF. It is run by Infrastructure Services for Open Access (IS4OA), a non-profit charitable company founded in 2012 in the United Kingdom and founded by open access advocates Caroline Sutton and Alma Swan.

The project will make use of deep learning algorithms to generate new knowledge (Research Topics, Affiliations) from the extracted ones and consequently to further enrich Wikidata with bibliographic information. When the project will be finished, the bots will continue to work for years to regularly curate and update scholarly information in Wikidata.


Why is this project needed? What will it solve or improve?

Currently, Wikidata lacks full coverage of scholarly citations and computer science publications giving a distorted mirror of worldwide research productivity and quality. This task will enrich the Wikidata citation graph and significantly ameliorate the coverage of computer science research scientists, conferences and journals in Wikidata.


Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project?

  1. Development of OpenCitations Bot to enrich Wikidata with bibliographic information and citations of publications from OpenCitations (one month)
  2. Development of DBLP bot to enrich Wikidata with bibliographic information about scientists, venues and journals (two months)
  3. Applying for bot flags (one month)
  4. Running the bots on a server (one month)

Measures of success[edit]

What are criteria you will define success for your project, and how do you intend to measure for them? What are your targets for these measurements?


Who is your target audience for this project, How will you engage the community you’re aiming to serve at various points during your project?

The target audience is the WikiCite Community and the Wikidata Community. I am willing to engage the community by:

  • Inviting them to go through the source codes of the two bots implemented by Python using the mailing lists or the Telegram channels of WikiCite, LD4, Wikimedia and Libraries User Group, and Wikidata.
  • Inviting them to provide comments on the two bot flag requests in Wikidata.

The Budget[edit]

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table or link to a separate (public) document if needed.

The items that are put here will be used for years to regularly import DBLP and OpenCitations to Wikidata. They will not be only used for the five months of the project:

  • High-Performance Computer with GPU and CPU: 3500 USD
  • Internet Connection: 500 USD

The High-Performance Computer will be hosted in Faculty of Sciences of Sfax, a public scholarly institution in Tunisia. It will be used by a large team of scientists to develop Wikimedia-related applications including the two bots. It will not be a personal property of any member.

COVID risk assessment (for in-person events)[edit]

If the project is for an in-person event, you must complete the risk assessment tool and checklist, and provide a link to copies of these documents here. Events must not include any international travel, and must follow all applicable local health guidelines.

The project is a bot development initiative for WikiCite project. All activities are remote and no in-person event will be organized for the work.


Community notification[edit]

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.
Please provide links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions.


Optional: Community members are encouraged to endorse your proposal and leave a rationale here.

  • Sounds like a good plan to me, would be nice to get a more complete database of authors and works and to improve identifiers for existing items. Iwan Aucamp (talk) 17:20, 25 September 2020 (UTC)[reply]
  • Support Support DBLP is the best open source on the web of CS publications. This looks like a valuable contribution! Jodi.a.schneider (talk) 20:08, 25 September 2020 (UTC)[reply]
  • Support Support Absolutely endorse this. Those two databases are essential to the value of the citation graph in Wikidata and will allow accurate tracking of the flow of citations between publications. Even though initially just computer science works, I hope that this lays further groundwork and protocols for other fields to be better represented.
  • Oppose Oppose The mission is good but I need more detail before I would endorse such a project. See my questions on the talk page. Support Support conditioned on the concerns regarding wikidata handling the data volume being resolved. BrokenSegue 15:07, 26 September 2020 (UTC)[reply]
User:BrokenSegue: See discussion for answers. --Csisc (talk) 15:54, 26 September 2020 (UTC)[reply]


Any questions about this proposal and feedback from reviewers should be placed on the associated discussion page.