Grants:Project/Fjjulien/Modelling and Populating Performing Arts Data in Wikidata/Timeline
Timeline for Fjjulien
|Milestone 1 : Kick off meeting||08 May 2020|
|Milestone 2 : Define the order in which the entities will be processed in Wikidata (Performing artists, Performing organizations, Performing arts venues/buildings, Performing arts works/productions)||3 June 2020|
|Milestone 3 : The first international committee meeting||3 June 2020|
|Milestone 4 : Define the pedagogical objectives for the Wikidata workshop series||19 June 2020|
|Milestone 5 : Planning Wikidata workshops over 12 months||19 June 2020|
|Milestone 6 : Situation analysis (benchmark before starting to ingest new data in Wikidata, relatives to performing arts)||30 June 2020|
|Milestone 7 : Preparation and ingestion of the first CQT dataset (Conseil québécois du théâtre)||7 July 2020|
|Milestone 8 : Deliver the first Wikidata workshop in French and in English||8 & 9 July 2020|
|Milestone 9 : Publication of the didactic capsules from the workshop for sharing with the community and also for the use of future participants in the workshop series.||22 July 2020|
|Milestone 10 : Synchronization of CQT dataset with the Wikidata data pool (block of the first 100 members of the list)||11 August 2020|
|Milestone 11 : Deliver the second Wikidata workshop in French and in English||12 & 13 August 2020|
|Milestone 12 : Deliver the second advisory committee (person-centric)||9 September 2020|
|Milestone 13 : Deliver the third advisory committee (person-centric)||15 October 2020|
|Milestone 14 : Ingestion of the CQT's dataset in Wikidata (members, organizations and events)||27 October 2020|
|Milestone 15 : Deliver the fourth advisory committee (organization-centric)||19 November 2020|
|Milestone 16 : Deliver the fifth advisory committee (organization-centric)||16 December 2020|
|Milestone 17 : Deliver the 7th, 8th and 9th workshop. the 9th is the last chapter of the workshop serie. Translation in French of the Wikiproject. Quality control and consolidation of the CQT dataset ingestion.||January - March 2021|
|Milestone 18 : Deliver the last advisory committee of the project. Closing the project.||25 March 2021|
The complete schedule of workshops, in English and French, is available on the Linked Digital Future website.
Please prepare a brief project update each month, in a format of your choice, to share progress and learnings with the community along the way. Submit the link below as you complete each update.
The month of the project launch.
A) The 1st advisory committee met on 3 June. Here are the minutes of the meeting. The committee gave an overview of the objectives and issues of this project. It was decided to work in adhocs sub-committee to allow a more active and targeted participation of members during the project.
Beat Estermann shared with us a very inspiring page of statistics for our current situation analysis stage. Easily visualizing the mass of relevant information already existing in Wikidata is very important to establish comparison criteria between the beginning and the end of the project.
B) Processing of the first dataset shared by the CQT in Wikidata has begun. We have chosen to publish the list of CQT members (individuals and organizations). The work will be completed at the beginning of July. The CQT communicated with its members to explain the importance of this project for their digital discoverability. The CQT proposed to its members to withdraw from the list to be shared before the push in Wikidata. It is very important for us to act with respect for data protection and members' consent.
C) The project was the subject of a practical workshop during the 6th Swiss Open Cultural Data Hackathon - Online Edition. Frédéric Julien and Birk Weiberg were the moderators. A few members of our committee and of our team participated in the workshop (Bart Magnus, Beat Estermann, Antoine Beaubien, Gregory Saumier-Finch, Andrée Harvey, Véronique Marino.
The agenda of the workshop :
- Round of table: Who has ingested / is planning to ingest what kind of data on Wikidata?
- Round of table: Who has been using / is intending to use what kind of data from Wikidata?
- What are the greatest challenges participants experience in the context of Wikidata? (brainstorming / clustering)
- How are we planning to work together over the coming months?
Find the report of the workshop: HERE.
This activitie was very important to understand the issue of the project in a global and international view.
D) On June 20th, we opened registrations for our first free online Wikidata workshop.
95 people registered for the French workshop and 19 for the English version in 4 days!
We created a visual signature that highlights the partnership with CQT and tried to make the subject less geeky :)
UPDATE : On June 30th, we have 49 registrations for the English workshop.
E) First Blog post on the ANL Website : The wikidata project for the performing arts is on ! By Joana Neto Costa. , in French too.
What we learned in June ?
- Members of the cultural community are interested in learning how to use Wikidata, which is good news for our project.
- It is fundamental to establish criteria for quantitative and qualitative evaluation of existing data in Wikidata related to the performing arts.
- It is important to refine the SPARQL queries by building them with experts in the subject matter.
The month of testing to define Phase II
RECAP : In July we completed Phase I of the project, which was aimed at identifying the right things to do in the right way for the coming months.
We had very different and complementary deliverables to complete in record time. We did it. We're ready for Phase II.
1) The first workshop : Introduction to Wikidata in French and English was a great success. We captured an audience of 180 people, 110 of whom signed up with us for this first workshop. In French, on July 8, out of 95 registered participants, 50 showed up and stayed for the 90 minutes of the workshop. In English, on July 9, out of 90 registered participants, 60 showed up and stayed until the end of the 90 minutes.
A satisfaction survey at the end of the webinar shows the relevance of the topic and the willingness to come back for the rest of the series.
2) The audience was primarily from the performing arts community. Theatre and dance were in the majority because our communications were primarily directed to these communities. We believe that this is a good signal about the receptivity of the communications issued by the CQT and CAPACOA in the context of our project. Not only artists but also many people as agents and managers working for creation or distribution organizations registered and presented themselves.
3) The learner profile was very homogeneous in both languages. Completely beginner but technophile enough not to be put off by the subject and a deep desire to try.
4) The two workshops were fully captured during the Zoom session and we decided to go further than initially planned: we edited the two videos to extract functional, precise and didactic capsules. These capsules will thus feed the IANL website in English and French but also the CAPACOA Youtube channel.
We have built playlists by language. Each of the future workshops will be treated in the same way to build a collection of 9 unique workshops. These videos present a clear visual signature and will become promotional tools for the project, CQT, CAPACOA and Wikidata to neophytes from all horizons but also serve as catch-up material for all those who will come to the workshops along the way.
5) Workshop number 2: Contributing to Wikidata will put forward the creation and editing of item according to the principle of statements, qualifiers, references. Registrations for Workshop 2 (in French and English) are open. The online workshop is scheduled for August 12 & 13 and on the IANL website too. A first newsletter was sent to the CAPACOA community with the registration link. We will made a new push at the beginning of August. We have 9 subscribers for each of the workshops in French and English. At this time we have an excellent return rate of participants. More than half of the participants have participated in Workshop 1.
MODELING, INGESTION, ISSUES
1) CQT dataset: recovery and extraction of member data is complete. We have modelled the whole to keep only the public elements. All sensitive information has been removed from the pool. No members have requested to be removed from the publication. To ensure that we have control over the time it takes to tune the data between our spreadsheet and Wikidata, we are working on a group made up of the first 100 members on the list. It is important not to generate duplicates to maintain the integrity of Wikidata. This consolidation activity is very time consuming and can hardly be fully automated. We will use OpenRefine for mass publishing and for reconciliation processing work. Details will then be corrected by hand.
This step can be practiced in Workshop 2 or 3 of the training series.
2) Modelling issues: this subject will be the core of our future activity. We have set up the infrastructure, a first test pool of real data, meetings, appointments with the community, the order in which entities are processed. We will be ready to begin the core of the mandate in September with a lot of material already in hand. The analysis of the situation will be completed. Also, a connection with the LODEPA project through members of the advisory committee, such as Beat Estermann, could accelerate the project in its international dimension. This subject will be explored in September.
What we learned in July
- The performing arts community is ready to learn about linked and structured data.
- It is important to explain the difference between Wikidata and Wikipedia.
- The benefits of Wikidata are very convincing with learners.
- Workshops must be practical and allow learners to manipulate Wikidata live on topics that affect them.
- In light of everything we learned during Phase I, we were able to build an action plan for Phase II that will focus 60% of our efforts on modeling issues. The remaining 40% is divided between the 7 remaining training workshops starting in September and communication around the project until March 2021. On the wikiproject:performing arts and the WikiProject Cultural venues in particular.
The month of August was quiet.
We held the second workshops on August 14 and August 15. 44 participants attended.
CQT data preparation activities progressed. We anticipate to proceed with the batch upload in October.
Research on the initial benchmark continued.
From the beginning of September, things started again. On September 9 and 10, workshops 3 on the Wikidata Query Service took place and research activities resumed.
After three workshops, we are on track to meet and exceed our perfomance indicator (160 participants attending workshops):
- 61 unique participants to French workshops
- 69 unique participants to English workshops
- 3 in 10 attended more than one workshop.
The majority of registrants are all members of the performing arts community. We also have participants from the world of Wikidata, Wikipedia and some developers of Wikidata related publishing tools. Their motivation is to learn but also to get to know Wikidata users better and thus improve the uses and functionalities made available to this emerging community. The trend is towards an acceleration of recurring participants. We have increased communication around the upcoming workshops (about 30 people have registered for the next workshops) to continue to attract new participants and engage them. For the October workshops: 24 participants already registered for the upcoming October 8 workshop in English and 12 are registered in French. CAPACOA's Youtube channel is constantly growing in terms of views, retention and subscriptions. Even if 95% of the consultations are made by non-subscribers. This indicates a fairly organic visibility.
The advisory committee met on September 14th to discuss recommended properties for performing arts persons.
Further to the meeting, enhancements were made to the WikiProject Performing Arts:
- A list of properties for persons was added to the “Data structure” tab;
- Recommendations on how to use these properties were drafted (see this discussion document);
- New queries were added to the “Statistics” tab.
- Typology: we will add a custom list of common performing arts occupations based on the initial query : https://w.wiki/dGf
The next committee meeting will also focus on persons. It will take place on October 15, 2020.
The international LODEPA Wikidata/Wikipedia Working Group met on September 22, 2020. There were 11 participants. We discussed use cases as well as modelling challenges (see the meeting minutes).
D) CQT dataset
In light of the discussions, it was decided to extract more information from the database in order to get as close as possible to the recommended ingestion properties in wikidata. This exercise will demonstrate the existing gaps and make a case study that will be published under the appropriate tab here. The data ingestion will be done this month.
What we learned in August and September
- Discussion about modeling issues are important and they can't be rushed. It takes time for members of the committee to arrive at a common understanding of what concepts are actually being dicussed, even when documentation is provided beforehand.
- The devil is in the details. Queries about occupations can render an unexpected mix of occupations and roles. Then, examination of the unexpected results revealed that certain items are subclasses of two different classes. Is this the expression of a diversity of views within the Wikidata communinity or a conceptual error needing fixing? Achieving a robust superclass/subclass hierarchy for occupations is proving to be a demanding task.
October is the time when we change the format of the workshops. We presented the main features and concepts attached to Wikidata. We are entering the participative coaching phase. We have invited two new coaches specialized in Wikidata to take the lead of the future workshops. From now on, our coaches will give a short introduction of half an hour, after which the participants will be able to perform actions in Wikidata themselves for one hour. We separate them into small groups, always in Zoom. The workshop in October showed us the appetite of the participants and the commitment was very high.
The advisory committee met on October 15th to further discuss the modelling of performing arts persons (see the agenda and minutes).
- This meeting enabled the committee further clarify and define work-related concepts:
- Occupations denote the kind of work usually performed by a person and they are defined according to skills and education required for that work;
- Positions denote a specific employment/mandate relationship between a person and an organization;
- Roles denote contributions of persons to performing arts endeavours (works/productions or events).
- With regards to work relationships between persons and organizations, the committee retained the following modelling approach: Work relationships between persons and organizations should be stated in the person item with the employer (P108) property (see the discussion summary and rationale).
- Further to the meeting, a working group took on the task of establishing an Overview Table of Occupations Typical of the Performing Arts. This table identifies equivalent position and role properties (where applicable).
C) CQT dataset
The CQT dataset have been ingested into Wikidata. 4806 statements were generated for person items. Around 8000 statements were produced around the organizations. We are starting a quality control phase. Further quality control and enrichment activities will also be taking place during our workshop series. Here the short link for the query : https://w.wiki/i$Q
To enhance the value of the Wikiproject and to get workshop participants to use this space as a reference space for questions about how best to document an item related to the performing arts, we have created additional subpages as support for the workshops in both languages. This allows sharing information, enriching the wikiproject and turning it into a reference for non-initiates or non-practitioners who are regular Wikidata users.
Members of the advisory committee initiated a consultative process with Indigenous arts and culture practitioners.
The LODEPA WG6 on Wikipedia and Wikidata met on November 19. There were 9 participants. We discussed the current usage of Wikida-powered infoboxes and agreed on the need to do further work in this area. Adoption of Wikidata-powered infoboxes seems to be greater in frWiki and this could be a starting point for a project. See the workshop minutes.
D) Modelling activities
Disambiguating legal entities
Performing arts organizations often have a usual name that differs from their legal name. Further, the name of their main venue or their main festival event is often counfounded with the name of the organization. This makes performing arts organizations rather hard to disambiguate, and this in turn renders the ingest of datasets on performing arts organizations challenging.
In the absence of a widely adopted global unique persitent identifer for performing arts organizations, national business numbers can be used as an identifier.
The Canadian Business Number (BN) is attributed to all legal entities incorporated or registered in Canada as soon as they interact with the federal government for tax, payroll, charitable or other purposes. Although the Canadian BN is currently not easy to retrieve, its high prevalence makes the BN a useful identifier for disambiguating legal entities, especially in domains where global unique persistent identifiers do not exist or do not have broad adoption in Canada. For this reasons, we made a proposal to introduce a property for the Canadian Business Number (as it exists for at least 30 other countries). The proposal was approved.
What we learned in November ?
Although occupations seem to be the primary focus in artist items, they are not as meaningful as position and roles. Roles in relation to performing arts endeavours are arguably the most potent work-related concept. Besides, use cases along the primary value chain are sometimes best served with information on skills than occupations.
- The workshops have reached a certain cruising speed with a strong recurrence of participants. The workshops have an average of about 20 participants in French and English. To allow us to properly evaluate the impact of the workshops, we have been following the activities of the participants, workshop by workshop, since the beginning, in real time.
- The OutReachdashboard allows us to read the actions taken by each participant and to count the real contribution of these workshops to the enrichment of the performing arts in wikidata. The result is spectacular and the participants particularly appreciate the coaching formula.
- Further to our modelling activities on persons and on organizations, we proposed a new property for "artistic director". The artistic director is a key executive in performing arts organizations and it needed to be represented with its own property. Here is the page of the discussion around the property P8938. The proposal was approved.
- Members of the committee reached out and consulted with potential partners for the consultation with Indigenous arts and culture practitioners, including authors of the CARE Principles for Indigenous Data Governance and the First Nations, Metis and Inuit – Indigenous Ontologies, as well as with the organizer of an initiative to Indigenize Wikipedia.
- The Committee met on December 16th. Discussion on the typology for performing arts organizations continued. The "theatrical troupe" (Q742421) class item is one of the most commonly used, but it appears to be meaning different things to different users and is therefore not a good contender for a superclass to all performing arts organizations. Alignment with Schema and with RDF ontologies is desirable. See the documentation.
C) Modelling activities
As noted in the google doc, our research allows us to describe a fairly clear situation of the organizations in performing arts in Wikidata. The main statements are :
- There is a very large granularity with almost 700 classes and subclasses existing to date to classify an organization acting in the performing arts whether it is theater, opera, music and other live performances. Looking at the available classifications, it quickly becomes apparent that there is a great deal of confusion between the concept of class, subclass and element. This confusion needs to be addressed to limit its negative effects.
- Almost 120,000 items are present under this abundant classification.
- The entire value chain of the domain is represented: ideation, creation, production, interpretation, promotion, dissemination and related activities.
- No clear and unifying superclass that could then build on the multi-disciplinary subclasses stands out, but there are a number of "champion" classes that could be improved upon.
- The relationship between external classification models, such as the NAICS classification and the Schema.org model, is possible and allows for consistency and externalization of the proposed unified language.
Work on modelling of performing arts venues began. We added 16 recommended properties to the list of properties in the WikiProject Cultural venues, and we provided examples from well modelled performing arts building items.
These modelling findings and activities inform our workshops findings and vice versa. In the workshops, we enjoy testing the recommended properties with members of the performing arts community to ensure that their understanding and vision of their domain is well reflected in our model.
A) Workshops Workshop #7: Edit a building and a place in Wikidata in both langages French and english. Retention rate of the subscribers : 65% for the english workshop and 50% in French. Total participants:21. As you can see on the Outreach dashboard, 160 items were created by 9 different editors. Here the details of the activity around workshops #7. The workshops are really appreciate by the participants.
B) Modelling activities
- Based on committee feedback, we further defined/refined the "artistic director (P8938)" property and its relationship to other role properties:
- "artistic director (P8938)" is now a subproperty of "director / manager (P1037)".
- We are recommending the use of "director / manager (P1037)" to denote holders of administrative direction positions. In such cases, a "object has role (P3831)" qualifier could be used to provide more details as to the nature of the position.
- We chose not create a subproperty relationship between "musical conductor" (P3300) and "artistic director (P8938)". Rather, we added "see also" statements.
- All three properties are now listed in data structure and typology tabs of the WikiProject Performing arts (along with other important organization properties that were still missing from the list).
- Modelling of cultural venues continued.
- The typology of performing arts venues and buildings is fairly clean: see subclasses of "performing arts buildings" and subclasses of "event venue". "performing arts building" being a subclass of event venue, all its subclasses are also subclasses of event venue. This being said conceptually clarity on the meaning and the use of these class items should be sought.
- Open issues were identified:
- Some items incorrectly use location (P276) to provide geographic information.
- As there is no property for start of construction, the property inception (P571) is sometimes used to describe the beginning of the construction of the venue. earliest date (P1319) and latest date (P1326) qualifiers can be used in conjunction with inception (P571). Is this a correct use?
- The data structure of the WikiProject Cultural venues was further expanded. It now includes 32 recommended properties.
- We illustrated properties with examples from well modelled performing arts building items.
- We wrote an initial set of notes on properties.
C) CQT dataset Quality control activities, creation of queries to enable batch and manual error assessment and correction. Final validation of information with original CQT data.
A) Workshops We validated the data structure in the WikiProject Cultural venues during the workshops. Notes on properties were amended further to the workshop. Also, we chose to do a workshop on the works because it is fundamental to progress on this subject even if it is a very complex territory. It seemed important to us to do it to complete the 360 vision that we wanted to bring to the participants.We builded an agenda to begin the description of works. At the end, participants were asked to contribute to the content of the next workshop by returning a survey before March.
10 Indigenous arts and culture practitioners answered the call for participation issued in November. Six of them participated in a first gathering on February 26th:
- The notion of digital discoverability did not have a lot of appeal among gathering participants. Words such as “visibility” and “reciprocity” were preferred.
- Participants felt ‘careful’ or ‘protective’ of how they are represented online.
- None of the participants use settler-built directories, databases or open knowledge bases such as Wikipedia or Wikidata to disseminate information about their career as an Indigenous arts and culture practitioner.
- Participants agreed to continue the dialogue in April.
- See the minutes
The LODEPA WG6 on Wikipedia and Wikidata met on February 11th. There were 6 participants. We discussed best practices for engaging the community in discussion pages, as well as ways of disentangling Wikidata items that are linked to conceptually distinct Wikipedia articles and authority notices. One-to-many interwiki links are needed. See the meeting minutes.
D) CQT DATASET Final count of the ingestion:
- Organisations: 129
- Persons: 1129
- Theater plays: 379 (note: These works have been uploaded before modelling of works could be finalized. Moreover, there were too many missing person items to state performers and contributor roles. We therefore only stated the title, the country, the class. The objective was to enrich the data and to allow a sufficient primary pool of items to start the whole process of description of a theatrical work directly in Wikidata with the support of the community)
- Map of the buildings and venues
- Map of the theatrical troupes.
- All statistics: Here.
- Images related to the CQT on Commons.
- The WikidataProject is now available in French.
- The home page of the Wikidata:WikiProject Cultural venues is available in French too. Still in translation for the others tabs.
The last workshop was designed by the participants and majority want :
- Learn how to link information between Wikidata, Wiki Commons and Wikipedia
- How to interact on discussion pages
- Explore different search tools (Sparql, PetScan)
- Explore batch data integration tools (Mix & Match, Quick Statement)
|Workshops||Registrations||Participants||% of retention|
|Total 9 French workshops||341||164||48%|
|Total 9 English workshops||297||160||48%|
|Average per French workshop||38||18||48%|
|Average per English workshop||33||18||54%|
|Unique French participants||91|
|Unique English participants||80|
We tracked workshops participants edits during each workshop and during the 7-day period following each workshop, using the Outreach dashboard application (which you can see HERE). The results were beyond our wildest expectations:
- 170 total editors (84 unique editors)
- 1450 items created
- 12,700 edits
- Note: These statistics only include workshop participants who accepted to disclose their user ID (this information was not mandatory).
We added two "Wikidata Office Hours" events at the tail end of the Workshop series.
The advisory committee met on March 25, 2021 (see the documentation and minutes):
- The committee reached consensus on the performing arts group superclass;
- The committee discussed and validated the data structure for cultural venues;
- The committee discussed the conceptual distinction between event venues and performing arts buildings;
- The committee discussed outcomes of the current project and next steps to continue modelling and broadening outreach.
C) Modelling activities
In addition to modelling activities dealt by the committee, we:
- Cleaned up the class item for performance (Q35140): moved references to distinct concepts, mapped it to external ontologies, and referenced the subclass statements.
- The performing arts group superclass: subclass statements were added to 11 performing arts class items. See the resulting graph