Grants:Programs/Wikimedia Research Fund/The Spaghetti Junction. Linking literature and cinema in Wikidata

From Meta, a Wikimedia project coordination wiki
statusnot funded
The Spaghetti Junction. Linking literature and cinema in Wikidata
start and end datesJuly 2023 - July 2024
budget (USD)40000 USD
fiscal year2022-23
applicant(s)• Marilena Daquino and Daniele Metilli

Overview[edit]

Applicant(s)

Marilena Daquino and Daniele Metilli

Affiliation or grant type

University of Bologna; University College London

Author(s)

Marilena Daquino and Daniele Metilli

Wikimedia username(s)

Marilena Daquino: User:Emmedaquino;

Daniele Metilli: User:Mushroom;

Project title

The Spaghetti Junction. Linking literature and cinema in Wikidata

Research proposal[edit]

Description[edit]

Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.

Literature enthusiasts and movie geeks spend significant efforts in linking Wikipedia pages under categories, in dedicated pages, or editing sections (e.g. Adaptations). Some WikiProjects and task forces have defined priorities and contributed to enrich Wikipedia with links between works. Unfortunately, only ~33% of such links between movies and literary works exist in Wikidata (WD) (see our preliminary analysis).

Nonetheless, having such links between QIDs and/or emerging entities (i.e. not yet in WD) would significantly impact quantitative transmedia and multi-modal research. An open graph connecting a wide variety of media types would open up new avenues of research for scholars interested in media studies.

Entity linking is a challenging task, encompassing knowledge extraction from multilingual texts, entity disambiguation, counter-fact checking, and candidate ranking. WD users have requested support for related tasks. To date, several datasets leveraging the WD knowledge graph and Wikipedia exist to cope with similar tasks (Möller et al. 2021). Popular approaches make use of language models, graph embeddings and other statistical methods, which can be fine-tuned and re-assessed.

However, such approaches may fail if the target graph is incomplete, thus requiring data integration, and reliable results depend on authoritative sources for counter-fact checking. To date, links between movies and literary works are available in proprietary databases (with reliability limitations and restrictions on reuse) such as the Internet Movie Database, where novelists are mentioned as participants but not the work that inspired the movie (if any).

In this project, we propose a holistic approach (based on the aforementioned literature) to identify, disambiguate, and rank candidate links between literary works and derivative movies extracted from Wikipedia pages, categories, and online web sources (e.g. VIAF, OpenLibrary) and confirmed by experts (e.g. Metacritic). We enrich Wikidata with links between existing items (extracted with high confidence) and we generate a Linked Open Dataset with links to emerging items. Links can be reviewed via interface by WD editors, who can explore the Spaghetti graph and assess links to be used when creating a new entity. To evaluate our work, we realise a prototype on Italian literature and international movies and TV series.

[version w/ links: https://github.com/tommasobattisti/TheSpaghettiJunction/blob/main/project.md]

Personnel[edit]

  • Short-term research software engineer (to be hired).
  • Short-term research data engineer (to be hired).

Budget[edit]

Approximate amount requested in USD.

40000 USD

Budget Description

Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).

(1EUR → 1.06USD) The University of Bologna would be the only beneficiary of fundings.

- Salary or stipend: 19.200 EUR for a 1-year fellowship; up to 9.600 EUR for a 6-

month fellowship

- Open Access Publishing costs: 0 EUR, we can benefit of some agreement with

publishers and gold OA journals

- Conference and travel expenses: 2000 EUR

- Institutional overhead: 4000 EUR

- Equipment: 1 laptop (3000 EUR)

Impact[edit]

Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.

We contribute in filling gaps on topics of potential impact (2030 Wikimedia Strategic Direction), namely:

  • Align Wikipedia and Wikidata. We publish trusted links between QIDs, to improve WD users’ experience
  • Recommend new links. We offer to developers, researchers and WD editors a LOD source (under CC0 license) including links between works not yet in WD (identified with appropriate IDs). A web interface allows them to look for a work, retrieve and assess candidate links, as well as a pop-up widget suggests links when editing WD items.
  • Create scalable, reusable methods. We inform Wiki communities on how to reuse and scale up our methods to create links between works of any kind and any nationality.

Dissemination[edit]

Plans for dissemination.

We plan to publish up to 3 articles in open access journals and conference proceedings. Among the conferences and workshops of interest, we identified the Wiki Workshop (Web Conference 2024), Wikidata Workshop (ISWC 2024), WikidataCon 2024, and AIUCD 2024. Journals of interest include Digital Scholarship in the Humanities, Umanistica Digitale, Journal On Computing and Cultural Heritage and journals on media studies to be identified. Results will be presented in seminars/hack days.

Past Contributions[edit]

Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.

Marilena Daquino is Assistant professor at UNIBO. Already winner of the Grant Wikicite: Wikipedia Citations in Wikidata (several articles leveraging Wikidata: https://scholar.google.com/citations?hl=en&user=HomzePYAAAAJ, e.g. CLEF, ARTchives).

Daniele Metilli is a research fellow at UCL. Already winner of Grant WiGeDi, former Wikipedia/Wikidata administrator and Wikipedian in Residence, (research papers based on Wikidata: https://scholar.google.com/citations?user=SFvyNLIAAAAJ)


I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.

Yes