Grants:Project/Fjjulien/Modelling and Populating Performing Arts Data in Wikidata/Midpoint

From Meta, a Wikimedia project coordination wiki


Report accepted
This midpoint report for a Project Grant approved in FY 2019-20 has been reviewed and accepted by the Wikimedia Foundation.



Welcome to this project's midpoint report! This report shares progress and learning from the first half of the grant period.

Summary[edit]

After six months, all activies are well on their way and the project seems poised to achieve its intended goals. We are delivering more training activities than anticipated and we produced introductory videos. We are through with the upload of person and organization items. And modeling activities have encountered a few challenges but are nonetheless progressing well. Furthermore, we are adding new consultative activies.

Methods and activities[edit]

Modelling activities[edit]

  • We assembled a committee of Wikidata and domain experts. The committee met three times between June and October. Specific tasks were delegated to committee members in between meetings.
  • We modelled performing arts persons, with a focus on occupations, positions in organizations, and roles in works.
  • We modelled performing arts organizations.
    • We created a property to denote the holder of the artistic director position within organizations.
  • We documented recommended properties, along with usage notes and examples from performing arts items in the WikiProject Performing arts.

Data population activities[edit]

  • The upload of Conseil québécois du théâtre's dataset was performed in October (see the outcome: https://w.wiki/i$Q). According to the report provided by the consultant who did the upload on our behalf, the upload included:
    • Roughly 500 person items, 73 of which already existed in Wikidata.
    • Roughly 200 organization items, 50 of which already existed in Wikidata.
    • 4730 statements were added to Wikidata.
  • We are in the process of cleaning the data post upload. We will be providing definitive counts of items and statements at the end of the project.

Training activities[edit]

  • We designed and delivered four introductory workshops, in English and French.
    • Presentation slides and workshop recordings were made available on the project website.
    • Workshop recordings were edited for dissemination on CAPACOA's Youtube channel. A short recap of each workshop was also produced.
  • After the fourth introductory workshops, we switched to a more hands-on workshop format in which participants are guided as they edit items. So far, the response to this more convivial and participatory approach is excellent.

New activities[edit]

  • Since September, CAPACOA is co-chairing a Wikipedia/Wikidata Working Group as part of the Linked Open Data Ecosystem for the Performing Arts initiative. This international working group provides a forum for discussing use cases for Wikidata. It is also an opportunity for us to promote the project at the international level.

Midpoint outcomes[edit]

  • The number of page views to the Wikiproject Performing arts increased steadily between September and November reaching 221. This is more than three times higher than the monthly average in 2019 (61 page views per month). See these statistics.
  • Even though we do not have a final count of uploaded items, we already know that we've uploaded three times more person and organizations items than we initially targeted.
  • After five workshops, we already reached our target of 160 total participants:
    • English workshops: 128
    • French workshops: 115
    • Totals: 10 workshops and 243 participants.

Finances[edit]

Expenses as at November 30, 2020[edit]

Expense Initial budget Revised budget Actuals (November 30) WMF Grant (CAD)
Research and working group coordination $33,300.00 $33,300.00 $22,074.00
Fees for working group members $16,410.00 $22,630.00 $15,545.06 $11,500.00
Fees for modelling and implementation in Wikidata $10,400.00 $10,400.00 $4,000.00
Synchronization and ingest of data $5,000.00 $5,000.00 $5,000.00 $5,000.00
Knowledge transfer: documentation and development of training materials $14,720.00 $14,720.00 $9,000.00 $14,000.00
Knowledge transfer: workshops over web conference $17,290.00 $17,290.00 $12,000.00 $3,000.00
Fees for implementation in Conceptual model for linked data in the performing arts $6,000.00 $6,000.00 $6,000.00
Translation costs $3,000.00 $3,000.00 $1,569.00
Travel expenses - domestic $500.00 $0.00
Travel expenses - international $3,280.00 $0.00
Salaries - CQT $5,000.00 $5,000.00
Administration - CQT $500.00 $500.00
Salaries - CAPACOA $8,200.00 $8,200.00
Administration - CAPACOA $1,000.00 $1,000.00
Total Expenses $124,600.00 $127,040.00 $75,188.06 $33,500.00

Learning[edit]

What are the challenges[edit]

  • We have found it difficult to engage domain experts when addressing complex modelling issues. Even if detailed documentation of modelling issues is provided in advance of each committeee meeting, are finding it intimidating to participate and discussions are dominated by modelling experts. We will experiment different methods for engaging domain experts during the second half of the project.
  • Wikidata class items linked to many Wikipedia articles are very difficult to apprehend. We found a lot of conceptual divergences in Wikipedia articles that relate to the same item (see, for example, theatrical troupe (Q742421)). This makes it quite challenging to harmonize descriptions across languages and to define the right class hierarchy. We still haven't found the right solution the class hierarchy of performing arts organizations.
  • Wikidata named entity items linked to Wikipedia articles also present challenges. Many Wikipedia articles describe both a building and the organization that manages it. This results in Wikidata items that describe two entirely distinct named entities - and includes links to external identifiers for both. Separating these Wikidata items into two distinct items is difficult and time-consuming. While certain statements can easily be attributed to the right entity, others require verification. In particular, checking external identifiers and asserting which entity each one relates can be very difficult. Some base registers are just hard to make sense of for humans.

What is working well[edit]

Next steps and opportunities[edit]

  • As explained above, we wrapped up the more didactic workshops and we are now offering more hands on workshops. The response has been very positive so far.
  • We are exploring what other datasets we could ingest into Wikidata. The directory of Union des artistes is a large dataset of Canadian artists and we are reaching out to this union to seek their collaboration in the ingest. We already created an external ID property for Union des artistes.
  • There are opportunities to use more Wikidata-powered infoboxes in Wikipedia. We have started exploring this opportunity as part of LODEPA (Linked Open Data Ecosystem for the Performing Arts) working group meetings. Work on infobox templates is being considered for a future project.
  • We issued a call for participation to Indigenous arts and culture practitioners. We felt it necessary to initiate a dialogue with Indigenous artists about their participation in Wikidata and their right to have their nation affiliation denoted in manner that is accurate and respectful. So far, initial feedback suggests that Indigenous are quite uncomfortable with the use of the property ethnic group (P172) to denote nation affiliation.

Grantee reflection[edit]

  • Class hierarchies, notably for organizations, can be quite complex and, to speak frankly, messy. We yet have to define a proper class structure for performing arts organizations and we will likely not see this task through over the course of the current project. This being said, we will have significantly improved the situation.
  • The workshops have attracted a core group of highly engaged users. However, most participants only attend on an occasional basis or they simply dropped after the first workshop. We are currently examining various other means to engage further the occasional participants and to reach out to new audiences. We think a combination of batch uploads with crowdsourced data cleaning could be an effective means of broadening engagement.
  • Some workshop participants have become very active contributors, and it's truly gratifying to watch them grow as Wikimedians!