Wiki Education Foundation/Wikidata Program Evaluation
In 2019, Wiki Education expanded its existing Wiki Scholars & Scientists program, in which we ran paid courses for subject matter experts to learn how to edit, into Wikidata. In the previous year, we developed a model for Wikipedia (original evaluation here and update here); in 2019, we added a second set of courses where we trained participants in Wikidata. What follows is an in-depth evaluation report of what we did, what worked well, what didn't work as well, our key learnings, and our plans for the future of our Wikidata Program. We welcome questions on the talk page.
In this first round of this program we worked with almost 40 GLAM professionals encompassing libraries, museums, Wikimedia, and other organizations. We experimented with two modes of delivery: six-week courses which met for one hour each week and a separate one-day workshop we held in New York City. We taught GLAM professionals about linked data, Wikidata, and made the case to use Wikidata as part of their workflow moving forward. Our goal was to have all participants engage with the Wikidata community, understand how Wikidata works, and improve or contribute content to Wikidata.
- 1 Theory of change
- 2 Preparation
- 3 Recruitment and Marketing
- 4 Training Modules
- 5 Online Courses
- 6 Wiki Education staff roles
- 7 Outcomes
- 8 Conclusion
Theory of change
Wiki Education has proven it's possible to teach subject specialists to contribute to Wikipedia, improving its quality in topic areas that receive a high number of views. The growth and influence of Wikipedia has posed some challenging questions for the future of the Wikimedia movement. These challenges concern language independence, machine readability, and freeing information from the more restrictive format of long-form Wikipedia articles. Wikidata helps address these challenges.
Information/GLAM professionals have dedicated their lives to the creation and organization of information, and we believe they have the training, experience, and understanding to contribute high quality data to Wikidata. They are the ideal population to create data models, evaluate ontologies, and improve the quality and coverage of data on Wikidata such that they can identify and address data disparity, bias, and sourcing that less experienced editors may miss. We believe that GLAM professionals who edit Wikidata will not only improve what already exists on Wikidata, but will also connect their collection data to Wikidata. If successful, this Wikidata Program will empower information professionals to add high-quality data to Wikidata about their topics of study.
Given their training, ideology, and access to data we wanted to know whether information professionals and subject-matter experts trained to edit Wikidata could fix these problems. We designed a course experience which, we want to demonstrate, would give participants the necessary tools, confidence, and hands-on experience to contribute meaningfully to Wikidata.
We sought to explore and answer these key questions:
- To what extent will information professionals address incomplete data on Wikidata?
- How will information professionals improve incomplete and possibly inaccurate data models and ontologies?
- Will they be able to contribute data or entire datasets that are not yet represented on Wikidata?
- Will participants find Wikidata a compelling enough platform to use, in some capacity, at their institution (both while taking the course or after the course ends)?
- Can we retain subject matter experts; will they remain active?
- Is this program worth continuing?
We built our Wikidata courses by drawing upon our experience teaching newcomers to contribute to Wikipedia and the wealth of training materials that the Wikidata community has developed over the past five years. In building our Wikidata courses, we sought to fill a need: to create a comprehensive, structured program the trains newcomers to contribute to Wikidata at scale. We also want this program to build upon the existing strengths of our existing Wikipedia programs. Subject area specialists and librarians have access to so much data, some of which is already structured. The idea that they could enrich Wikidata with new collection data and that these collections could benefit from existing data on Wikidata, querying capabilities, and enhancing their collection with crowd sourced edits seemed like a tremendous opportunity.
The Wikidata community is eager for more integration of its data in all language Wikipedias and they are always seeking a set of data more representative of the world we live in. After speaking with several Wikidata experts, we believed reaching out to libraries would be the best place to start in order to have the largest impact on Wikidata.
Having identified a plan, we sought to build a program to help subject matter experts to contribute to Wikidata, and connect library collections to Wikidata. We worked with members of the Wikidata community to help develop our understanding of the project and the training needs of new contributors. We solicited their feedback as we developed our curriculum and support materials.
Several excellent Wikidata training materials exist already, created by some very skilled Wikidata practitioners. We wanted to learn from their expertise and build off of the existing corpus of training materials. We set up meetings with some Wikidata experts. We sought feedback about our vision for this program, asked about their experiences with Wikidata, and asked what kind of course they wish had existed when they were learning Wikidata. We could not have created this program without their support. In particular, we would like to thank:
- Stacy Allison-Cassin, Associate Librarian, York University
- Dominic Byrd-McDevitt, Digital Content Specialist, National Archives and Records Administration
- Mark Custer, Archivist / Metadata Coordinator, Beinecke Rare Book & Manuscript Library, Yale University
- Jason Evans, Wikimedian-In-Residence, National Library of Wales
- Rob Fernandez, Assistant Professor, Resources Development/eLearning Librarian, Prince George Community College
- Barbara Fischer, Manager for New Cooperations, German National Library (DNB)
- Karen Hwang, Digital Projects and Metadata Librarian, Metropolitan New York Library Council
- Mairelys Lemus-Rojas, Digital Initiatives Metadata Librarian, Indiana University Purdue University Indianapolis
- Andrew Lih, Wikimedia Strategist, The Metropolitan Museum of Art
- Jens Ohlig, Software Communication Strategist, Wikimedia Deutschland
- Martin Poulter, Wikimedian-In-Residence, Oxford University
- Merrilee Proffitt, Senior Manager, OCLC Research
- Lane Rasberry, Wikimedian-In-Residence, University of Virginia
- Judy Ruttenberg, Program Director, Association of Research Libraries
- Keren Shatzman, Senior Coordinator, Academia & Projects, Wikimedia Israel
- Shani Evenstein Sigalov, EdTech Innovation Strategist, NY/American Medical Program, Sackler School of Medicine, Tel Aviv University
- Alex Stinson, Senior Program Strategist, Wikimedia Foundation
- Megan Wacha, President, Wikimedia NYC / Scholarly Communications Librarian, City University of New York
Wiki Education's many years of expertise in teaching newcomers how to edit Wikipedia guided our work. Our existing Wiki Scholars & Scientists Program follows a multi-week model that guides participants through several trainings, assignments, and resources, meeting once a week online; we thought a similar model would be a good place to begin.
Wiki Education uses its Dashboard as a course management platform, training center, and to track course participation. In anticipation of this program we tweaked some features to better capture contributions to Wikidata. These changes include being able to track changes by domain (wikipedia.org vs wikidata.org), housing the Wikidata training modules, and showing labels to Q-items in the Dashboard. Similar to our Wikipedia programs, we wanted to ensure that we could track participant contributions and also allow community members to share resources like we have with Wikipedia through the Dashboard.
Several articles have identified the current moment as a tipping point for Wikidata. Interest has swelled, different institutions are taking note, and the number of editors on Wikidata is growing steadily. Wikimedia Deutschland, the Wikimedia chapter that developed and maintains Wikidata, has identified this growth as a limit for their staff and for Wikidata admins who manage and maintain the millions of new items on Wikidata. To that end we want this program to achieve multiple goals: support the community, bring in new data, clean up existing data, create new properties, new items, and start to integrate Wikidata into libraries.
We would not have been able to start a program of this quality without the help of some mission-aligned organizations. In November of 2018, the Association for Research Libraries (ARL) shared a draft of a white paper they were working on. This draft showcased different case studies of libraries and librarians using Wikidata and Wikibase. Several Wikidatans contributed feedback to this white paper as it evolved. This document, which was published in April 2019 and accessible here, provided a roadmap for our Wikidata program, identifying individuals, institutions, and publications, which helped us develop this program.
Another partner we identified was the group of university librarians implementing the Linked Data for Production (LD4P or LD4) grant from the Andrew W. Mellon Foundation. This grant pulls together university libraries from the United States to explore different ways to create, use, and study linked data in the library environment. Part of the grant is running an annual conference to bring together linked data practitioners as well as Wikidata editors. Wikidata Program Manager Will Kent sat on the planning committee and contributed to creating the program's application, reviewing applicants, and supporting conference presentations.
As our program came together, both the LD4 network and ARL were especially open to sharing links, circulating emails, and allowing us to present at their meetings. This exposure helped us to identify program participants, which we will address in more detail in our marketing and outreach section. After the conclusion of our program, many participants expressed interest in continuing to have conversations with their colleagues about Wikidata. The LD4 group holds weekly Wikidata Affinity Group calls, which we have recommend participants attend upon completing our course.
Recruitment and Marketing
We split our offerings into "Beginner" and "Intermediate" courses because we identified two groups of people who were interested in our program — people who had no prior experience, and those who had some experience with Wikidata or linked data, but didn't think they knew enough to make the contributions they wanted to make. To experiment with different delivery modes, we offered six-week and one-day, in-person workshops.
- Course Size
We set a goal of working with 40 participants for this first round. We wanted a pool of participants that would be large enough to have a sizable impact, but small enough for us to monitor content, effectively implement the curriculum, and interest all participants.
In implementing a new curriculum we knew there would be newcomers to linked data in addition to a potential population of linked data specialists. In an effort to meet everyone at their skill level, we split our curriculum into two. We offered a beginner and intermediate version of this course, hoping for 10-20 participants to enroll in each course. Both curricula assumed no prior experience on Wikidata or with linked data. The differences between the two courses include a slower pace for the beginner course and a project-oriented emphasis for the intermediate course (a pre-selected dataset or a project from their work that could involve Wikidata).
Our outreach efforts generated a total of 36 applicants, which was below our goal of 45 applicants. Of those 36 applicants, one withdrew and 35 continued on to take a course or join a workshop. We also offered the NYC workshop host institution – the Metropolitan New York Library Council (METRO) – to bring two staff to the event for free. They are not included in these numbers.
We received no applications for the DC workshop.
The intermediate version of the online course filled up almost immediately, while the beginner course was slower to fill. This could have been for a variety of reasons like scheduling, interest, or self-perceived ability. We will continue to offer both versions of the course to better answer the question.
We set ourselves a revenue goal of $30,000 and a participant goal of 40 individuals from Wikidata courses and workshops.
|Join the Open Data Movement (Beginner course)||8||$6,400|
|Elevate your Collections (Intermediate course)||8||$6,400|
|NYC Wikidata Workshop||12||$9,600|
|Washington DC Wikidata Workshop||12||$9,600|
|Join the Open Data Movement (Beginner course)||10||$4,800|
|Elevate your Collections (Intermediate course)||13||$7,000|
|NYC Wikidata Workshop||15||$6,000|
|Washington DC Wikidata Workshop||0||$0|
- Discounts and Financial Aid
We set a price of $800 for our online course and in-person workshop. We also offered two discount options: an early bird deadline for $200 off or a group (two or more participants from the same institution) discount for $300 off which we hoped would draw additional applications. When funds were available we offered as much financial aid as we could based off of individual need. As with other professional development opportunities, it was our hope that employers or institutions would assist in deferring the cost for participants. The price we charged for the courses was based on our costs in terms of what it cost us in staff time and technology support to run the them.
|Pricing Options||Regular Price||With Discount|
^This amount reflects the cost of two seats
We did not achieve our revenue goal. We have identified reasons for this:
- we had fewer applicants than targeted
- our applicants wisely took advantage of our discount model
- we cancelled one of our workshops due to low enrollment which effectively eliminated a quarter of our projected revenue goal (29 of our 35 applicants registered with a colleague, meaning that 29 people saved $300 each, for a "loss" of $8,700 in potential revenue)
We are looking into other reasons why we did not meet our goal, which are more difficult to confirm without more data. These include:
- not enough applicants (which could be due to scheduling (time of day/day of week/time of year), marketing, or interest)
- our outreach could be directed at a more interested population
- our curriculum/training is not what potential participants are interested in
- the cost of our program may be prohibitive
We used a registration form to collect applicant details in order to determine their fit for the course or workshop. We asked questions about applicants' titles, experience contributing to Wikipedia/Wikidata, what they would want to work on in this course, and more. We provided a description of the specific courses on our website and wanted to be sure we were meeting everyone's expectations with this new curriculum. We hoped to ensure that we understood what skills and background participants would bring to the courses. We included a consent and expectations form to describe the public nature of editing Wikidata so participants could fully understand what they were choosing to participate in.
We included links to the registration form on our website and in our communications to contacts, leads, and listservs.
Once potential participants had registered, we sent post-registration details (when the first meeting would be, what participants would need to prepare before the first session) as well as a confirmation of registration, as well as a request for online payment. Unlike Wikipedia course offerings, there was little attrition from filling out the registration form to payment.
This was the first time we have offered a Wikidata course or workshop. It was the second time we have tried a fee-for-service model of course work. We did not turn any applicants away. We did not evaluate applications beyond a general interest in Wikidata or linked data. This approach ended up working well for this round of courses, but if we were to broaden the outreach, we may want to consider evaluating applications on experience, interest, and familiarity with Wikidata or other Wikimedia Projects.
For this first round of courses we had a little over three weeks to recruit. Although we were able to meet our recruitment goals, we would incorporate more lead time for recruiting participants in the future. Additional lead time for registration would help ensure the largest possible number of potential participants could register. If applications were to require in-depth evaluation, this time would be essential. Similarly, a larger window would benefit marketing for this course and likely reflect a larger applicant pool. We also learned that people love deadlines from this experience. 77% of applicants registered within days of all final deadlines (we set an early bird discount deadline which applicants responded to effectively - as well as the final registration date). It is our recommendation to advertise deadlines and adhere to them.
We have identified a few barriers to recruitment for this program that are worth sharing. Even though Wikidata itself isn't new, many still haven't heard of it or know exactly what it could do for them. This required our marketing to be as much a recruitment plan as it was a general education campaign about Wikidata. Another complexity is that Wikidata can be used in different projects in distinctive ways. We wanted to make sure that this course would be valuable for the largest possible group of people - this means both for recruitment and for the curriculum.
A final note on selection: We normally offer online courses and piloted an in-person workshop for this Wikidata course. It was a successful experiment, but posed its own unique set of challenges. Workshops are complicated; adding logistics, venue coordination, geographic limits, and a daylong workshop is a lot to ask for participants in terms of scheduling and curriculum.
Plans for future selection
We limited our initial round of recruitment for this course to university libraries. In spite of this narrow focus, we had applicants from art museums, Wikimedia-Switzerland (WM-CH), and a private company. For future rounds, having a more diverse set of participants would help us better answer questions about the practicality of Wikidata in various organizations. It would also help us engage content gaps in other areas and bring in fresh perspectives to our courses. Wikidata has distinct applications depending on specific projects and we want to ensure that we have a course, curriculum, and set of trainings that speak to all of these needs.
- Participant Profile
Based on the ARL Wikidata white paper, mentioned above, we started our outreach with individuals working in libraries. We felt that the buyer journey could be the most direct for librarians, even those who did not yet know about Wikidata. Within our existing network, we had around 1,000 leads and contacts. These contacts came from the 2019 ACRL conference, librarians we have worked with in the past, Wikidata experts who volunteered to reach out to contacts for us, past Wikipedia program participants, and LD4 Wikidata Affinity Group participants.
Based on conversion rates and projections from other programs and outreach, we believed we needed to generate at least 45 applications in order to fill our goal of 40 paying customers. We hoped those 45 applications would come from 100 people that we call "warm leads", individuals who had already indicated an interest in Wikidata, and 2,000 cold leads, individuals whose interest level was unknown to us, but we reached out to inform them of the Wikidata courses and ask them to enroll. Owing to other Wiki Education programs we already had 1,000 leads within our system, so we did additional outreach to 1,000 more people.
The other 1,000 individuals we contacted came from email listservs, including the Directors list from ARL; librarians working in the New York or DC area interested in metadata, cataloging, & archives; staff at the California Digital Library; as well as paid marketing via Google Adwords; Tweets; Facebook Ads, and blog posts. We reached out to all of these groups multiple times via email as well.
- Advertising plan
We utilized advertisements through Google and Facebook to drive course registrations and increase the visibility of our offerings. Both campaigns expected advertisements would generate interest in our product, ultimately leading to a conversion (i.e. a customer buying a seat in a course or workshop).
Facebook advertising was ultimately useful for driving traffic to our website. We reached 16,219 people with three advertisements that each ran for either a week or a month. We currently do not have capability to measure how many registrations were yielded through Facebook advertisements other than an anecdotal peak in registration during the time ad groups were running.
Although our staff is well-versed in teaching Wikipedia, Wikidata is a newer set of skills for us. We set up plan to master Wikidata fundamentals, interview experts who have taught with Wikidata before, and attend a series of conferences which would help us understand what potential participants would be most interested in learning about.
Taking advantage of our existing support infrastructure seemed like an efficient approach that would allow us to devote time to curriculum creation. Building off of the resources of our Student Program, we adapted our Dashboard to better track Wikidata edits. We crafted new training modules, and built a curriculum to connect the trainings through discussion, assignments, and relevant resources. With a small window of time to establish a proof-of-concept, our priority was to identify interested participants and send them through a program that they would find compelling, useful, and engaging.
Again drawing from years of teaching with Wikipedia experience, we had a solid frame for what works with teaching online courses to both professors and students. What we wanted to learn was how experts who work with Wikidata in a number of ways teach it to others. In our interviews with Wikidata teaching experts, we looked at existing Wikidata resources, and sought to identify pedagogies around Wikidata and linked data concepts to establish existing best practices before we created our curriculum.
We also knew from our initial rounds of interviews that participants who may have experience with linked data may not be familiar with the Wikidata linked data experience. It is distinct from a lot of other linked data repositories. Taking this into consideration, we devoted time to introductions to the Wikidata community, spending time building confidence, and putting an emphasis on participation (i.e. editing, contributing) on Wikidata.
Follow this link to view an example of our Wikidata course Dashboard, complete with a timeline that walks participants through assignments, concepts, and resources week by week. For each week we created a meeting agenda (in-class content), assignments, milestones, and relevant resources based on what we were covering that week. We required some sort of assignment or training every week to ensure engagement, provide prompts for asking questions, and to break up complicated concepts into bite-sized lessons. We strived for a pacing that would allow for conversations about topics we covered in our training during our video sessions to address lingering questions, deep dive into concepts, or explore related topics.
We created seven trainings: Intro to Wikidata, Databases and Linked Data, the Wikidata Community, Evaluating Data on Wikidata, Adding to Wikidata, Querying Wikidata, and Wikidata: WikiProjects. These trainings were intended to introduce participants to linked data concepts if they were unfamiliar, provide an overview of Wikidata policies, and give them the skills they needed to evaluate data in existing statements and create new items and statements.
With a topic as broad as Wikidata, we were forced to make decisions about what not to cover. We chose not to provide a training module on mass uploads because we saw this as something people should learn after they had a solid understanding of how Wikidata works. Instead of providing a step-by-step walkthrough to mass uploads that anyone could do without understanding the impact of their uploads, we chose to introduce the tools to our program participants and give them to means to seek out the relevant documentation themselves. Once they had done that, we were happy to provide additional advice on how to carry out the uploads. While many of our participants are just the kind of people who need to be uploading datasets to Wikidata, we wanted to minimize the burden we might impose on the community by facilitating bad uploads.
We continue to seek the ideal balance between benefits (to our participants and Wikidata) and potential burdens that this might impose on the community.
Similar to our other programs, meetings were supplemented by Slack and, to a lesser extent, on-wiki communication. We found the Slack channel to be a useful space to help highly motivated participants master advanced concepts that were outside the scope of what we could cover in class meetings.
We met for one hour a week for six weeks using Zoom, a video conferencing software. Meetings were scheduled before the registration window opened.
Scheduling and meeting size
Having set the course times beforehand, we were pleased that enough participants could attend at the pre-determined times to run the two online courses. We were also pleased that participants in Europe were also able to participate with the large time difference. We aimed to have the courses to consist of 10–15 participants. This size allows for everyone to participate during an hour-long session if they want to. It is a good size for sustained discussions.
Participants were satisfied with the the planning, timing, and pacing of these courses. Specifically the expectation of an hour-long meeting a week, supplemented by a few hours of work during the week was an appropriate amount.
Wiki Education staff hosted weekly video meetings with Wikidata Program participants. Participants seemed generally pleased with Zoom, the video conference software we used to run meetings. We recorded sessions and shared them on the Dashboard and over email. This worked well in the event anyone missed a session or if they wanted to refer back to a previous lesson.
Similar to our other course offerings, these meetings were useful in building community among the program participants as well as the Wiki Education staff. Sharing experiences, challenges, and feelings about the process of editing Wikidata is consistently rated one of the most important aspects of the courses we offer. We welcomed seeing this trend continue in these Wikidata courses.
With the six week timeline we wanted to make sure that we moved quickly, but not in a way that would overwhelm participants. Training modules and a set of tasks were assigned every week. Generally we devoted the beginning of sessions to addressing the trainings, fielding questions, and recommending resources that would assist with the principles in the lessons. Usually these questions led to conversations about best practices, policies, or differences between library systems and Wikidata. As we shared resources and examples with participants, the conversation shifted from general questions about Wikidata to specific questions about modeling items, expressing detail, and querying all of these things.
We had an engaged pool of participants in our sessions, regularly asking questions, looking for examples, and sharing their impressions of editing Wikidata. Still, we observed the 80/20 rule in our courses and in-person workshop. This is the rule where roughly 20% of participants contribute 80% of the discussion or questions asked. We tried to ask a question which required a response from every participant about their personal experience with an assignment to ensure everyone had an opportunity to contribute and be heard. In other courses we find this kind of participatory approach yields more questions about the project and inspires work. In all six sessions there was always enough conversation to not only pack the hour, but to also result in conversations spilling over into Slack, email, and questions between participants. We were extremely pleased with the amount of intra-participant interaction there was. Whether it was on Slack or in the courses, participants were willing to share their expertise and ideas with each other. Although this kind of close collaboration happens regularly on Wikipedia and Wikidata, it is unique to have these interactions happen face-to-face, and specifically for new editors. The process is often anonymous or semi-anonymous, which adds a layer of distance. It seems this approach brings the editors closer together.
Our participants were inquisitive, raising questions beyond our planned curriculum. Fielding a broad range questions which required us to pull in resources from across the internet. We relied on previously created Wikidata training materials, case studies, and the Wikidata Weekly Summaries. This served as a reminder that for those outside of courses like this, learning how to meaningfully edit Wikidata requires a lot of research, time, and discovery. We tried to spend meeting time gathering enough information to address these needs and ensure that any question the participants had could be answered in short order.
These meetings were productive and we are especially thankful to the participants for being active and understanding of our new curriculum.
Cohort size and managing sub-cohorts
These first two Wikidata courses had nine and eleven participants, respectively. We found this size worked well for discussions. It provided a critical mass always be able to attend (even with a few absences), and it was large enough to have a substantial impact on Wikidata. We were also very pleased that with two courses of this size, there was a good mix of experience and goals within the course.
The size allowed everyone to participate who wanted to participate. We were able to answer all questions in a timely manner, and there was never a shortage of things to discuss in the meetings. We are going to continue to pursue courses of this size or slightly larger.
Meeting different needs for different participants
One of the most exciting qualities of Wikidata is just how much you can do with it. As a result our participants had different needs and a range of skills they were hoping to get out of this course. Balancing these needs was a fun and persistent challenge.
We started this curriculum with libraries and library tasks in mind. Our participants represented a more diverse set of organizations, coming from museum libraries, companies, and Wikimedia Switzerland. Each participant had a unique set of interests and needs for the program. It was an interesting test of our curriculum and our online meetings to see if we could provide useful information for all participants. The different needs of each participant ended up generating engaging discussions that allowed us to demonstrate more parts of Wikidata than we would have had the participants all been from one type of institution. We were pleased with the support they provided for each other around shared confusions, answering each others questions, or relating to each other about their linked data experiences.
As with any specialized need, we wanted to do what we could to avoid overuse of jargon. The majority of our course participants were from libraries so not only did we have to be aware of the Wikidata jargon we were using, but we also had to watch the amount of library jargon that we used as well. We did not want the other participants to feel like this course was not developed for them.
Beyond jargon, we were also aware that a lot of linked data development is centralized in initiatives and groups like Bibliographic Framework Initiative (BIBFRAME), and the Program for Cooperative Cataloging (PCC), and within that the Name Authority Cooperative Program (NACO). Shifting a metadata creation process from a centralized group, with lots of procedural documentation, into the hands of individuals required some conversations about quality, trust, and oversight. We covered some of these themes in our training modules, but we also explored these topics in depth during our weekly meetings.
Another consideration regarding examples was our reliance on comparisons to Wikipedia to illustrate points about Wikidata in this course. A few of our participants had experience editing Wikipedia in the past, but it would be beneficial to make comparisons to other linked data projects — like the ones listed above — to provide an additional set of examples for participants to relate to.
In our online sessions, we did not know whether participants would come to the course with a linked data project in mind or not. To allow room for exploration and inspiration, we encouraged participants to think of ways they could use Wikidata at their respective institutions (or for personal research use). There was no requirement for this course to result in a project per participant, but we saw this course and opportunity to encourage structured engagement with Wikidata. Some participants were drawn to working on structured data in Wikimedia Commons while others were interested in learning how to apply query data to their research. Some wanted to learn about data modeling while others were purely interested in what it would take to move their collection data from their catalog into Wikidata.
Moving a collection into Wikidata is a multi-step process that takes longer than the six weeks of this course would allow. In spite of that, we did have a participant create an identifier property for a collection they work with that is now in use with almost 100 items (as of October 2019).
Survey results indicated that participants were generally pleased to learn about different uses for Wikidata that may not pertain directly to their interests. One participant did point out that some of the session conversations were too specific for all to benefit from. As efficient as it would be to have every conversation benefit every participant, we think it is great feedback, but difficult to implement in a course with many interests. As for the delivery of these various topics, many participants pointed to just how useful live examples a screen sharing were in relaying examples of specific Wikidata functions.
We were very pleased that every participant in this course was able to edit some statements, create items, and add labels. In setting a goal to have everyone edit, we realize that not every participant would necessarily be a regular Wikidata editor. In our meetings we spoke about the different roles Wikidata editors can have. This helped to balance the emphasis on editing we built into the curriculum.
One of the biggest surprises of this first round of courses was just how much some participants were able to achieve in terms of edits. Some were involved with the LD4 initiative and already had some Wikidata experience. These participants excelled at item creation, item merges, and knowing which properties to add to items. Others without any prior experience excelled in creating properties, documenting gaps in tools, and posting discussion topics on Project Chat, Wikidata's general question and answer forum. In reflecting equally on participant feedback as well as these successes, we're eager to meet more needs in future sessions, but also recognize that there is space for most participants to pursue what they are interested in already during this first round.
Wiki Education staff roles
Outreach & Communications Associate Cassidy Villeneuve created marketing materials, writing blog posts, and implementing the Wikidata training modules into the code of the Dashboard. She helped shape the way we talk about the program and relay concepts about Wikidata to our program participants.
Customer Success Manager Samantha Weald led recruitment efforts, identifying organizations, schools, and professional listservs we could reach out to. She handled the registration process, fielding questions, payment, and reminders.
Wikidata Program Manager Will Kent helped develop a process for creating this program. He identified learning outcomes and scaffolded a curriculum around them. With other members of the team, he created seven Wikidata training modules, crafted course Dashboards, and facilitated the six week sessions. He facilitated the sessions, fielding questions in person, on Slack, over email, and on Wikidata.
Wikidata Expert Ian Ramjohn participated in online sessions and the workshop as an expert on Wikimedia communities and his Wikidata experience. He addressed questions on Slack and helped to write seven Wikidata training modules with Will. He framed conversations around the significance of the Wikidata community, Wikidata policies, as well as communicating with other editors. Ian provided an additional resource for participants to rely on as they took lessons week after week.
Program Sponsor Frank Schulenburg created and supported the Wikidata task force which built and implemented the revenue generation side of the program; LiAnna Davis oversaw the programs side.
Quantitative impact on Wikidata
The 23 participants in the two online courses created 228 items and edited more than 2,500 items, making more than 9,200 edits. During the course of this Wikidata courses we had one property proposal approved and (as of October 2019) there is another one pending. In one course there was one editor who was working on merging items and merged nearly over 400 items. Due to Dashboard tracking, this put our "References Added" in the negatives, but these two courses added nearly 800 references to Wikidata statements. Participants created SPARQL queries that can be shared with the community, and some participants have continued to edit after the end of the course.
Having only tracked Wikidata contributions for smaller workshops, we did not have an idea of how large of an impact a six week course would have on Wikidata. As of October 2019 we are still unsure if these results are representative or if we had an especially active group. For now it is safe to say that the contributions these participants made far exceeded our expectations. These were both highly productive groups in terms of adding statements, references, and creating items. We taught these participants how to add items manually; these additions reflect no batch uploads.
- Item Content
We encouraged participants to edit items of their choosing. Some had items from their institution's collection that they wanted to add to Wikidata. Others did not have a set direction. We discussed biases on Wikidata and Wikipedia. Focusing courses in the future around these content gaps could have a more significant impact on the project. As it stands more than 200 items now exist on Wikidata that hadn't before. Not only is this more representative of our world, but adding more descriptive properties to items helped make more data discoverable.
- Property Creation
We did not anticipate courses reaching the advanced level of creating properties, and yet, we had participants in both courses propose properties. It is difficult to quantify the impact of creating a property, but one — the property of "exoneration" — will no doubt have a large impact on criminal justice, law, and modeling other post-conviction data from different countries. Note: as of October 2019 this property is still open for discussion, owing to the fact that there are not that many criminal justice models documented on Wikidata yet.
The other property, Archives Directory for the History of Collecting in America ID, is an identifier in the art world. This new identifier will help better connect collections from art museums with items on Wikidata. Additionally, identifiers serve as a an additional way to reference items across databases. This helps achieve a broader level of connectivity among Wikidata and other databases.
Data quality measurement
Wikidata is still in the process of developing data quality measurement tools. We selected a few variables to track to help us better understand the impact our program has on Wikidata.
- Number of items edited – Improves the amount of information about an item
- Number of items created – Improves representation of items that previously were not on Wikidata
- Number of references added – Improves accuracy and provenance of values in statements
- Total Edits – Captures total individual impact/interactions on Wikidata
|Course||Items Edited||Items Created||References Added||Total Edits|
^ Note: in this course we had one participant who was working on merging items. On our Dashboard these merges showed up as negative numbers since statements were emptied from one item and merged into another. This course in fact had more than 500 references added. We will work on the Dashboard for future courses to better capture and distinguish data on merges and references added.
Most of our course participants in both the beginner and intermediate courses did not have prior experience editing Wikidata. The above data demonstrate that new editors can indeed edit Wikidata meaningfully as a result of taking this course.
We were astonished by the amount of edits the Intermediate course was able to achieve. Most of these edits came from one individual with Wikidata experience. In spite of that this was still a high-performing course. What the Beginner course was able to accomplish is more in line with what we think a realistic expectation for course should be. The smaller numbers from the Workshop also make sense due to the course being a day-long. We tracked data from that course for an additional two weeks, but there was not a substantial number of edits made in that time.
Whether an in-person workshop or an online course, adding references is an activity that can have an impact. Although it may not have an impact on queries or Wikidata ontology, it does contribute to the quality of data on Wikidata. It makes sense that the six-week courses had more edits and items created. Although we tracked total number of edits, that number does not necessarily correspond to number of statements added (a mistake edit would show up as an edit for instance). We would like to be able to have the Dashboard track that number in the future so we can determine the number of statements added per participant.
Ideas for the future
If course participants are able to sustain this level of productivity on Wikidata, there is no reason why we couldn't propose more structured or quantified goals for these courses. Consider a course focused on:
- Selecting a dataset for participants to upload and base completeness off of that dataset
- Identifying a content gap (ex. Women Scientists, Hospitals in Africa, underused properties) — measure quality/completeness beforehand with queries and show the change after the course ends
- Exploring a specific domain — like criminal justice — and have program participants build out a data model for that domain. This would include property creation, modeling references, creating lists of references, and having a set of queries to evaluate the quantitative impact of the course. It wouldn't have to be bound to a WikiProject, although it would be an appropriate space to capture information like this
We are extremely pleased with the output of these two courses. The amount of enthusiasm, curiosity, and inhibition that these participants demonstrated emphasize the impact that a structured course can have on Wikidata. We hope that these kinds of results are not unique to these courses and that this model can be replicated and scaled to increase the number of editors on Wikidata and to ensure a high standard for data quality across the project.
Qualitative impact on Wikidata
General quality assessment
We encouraged participants to edit in several different ways – using tools, using their own data to inform how they edit, and analyzing items on their own to determine how to best describe that item. These different editing styles revealed some interesting observations about quality and quality tools that could be developed in the future.
We did not restrict or guide participants to edit one kind of item versus another. We also wanted to encourage participants to bring their own data to Wikidata, which would mean creating new items, representing things that have not yet existed on Wikidata. To ensure quality we encouraged participants to familiarize themselves with relevant properties and to search for data models to provide a template for editing.
One measure of quality was that several participants indicated comfort using qualifiers to express relationships with greater nuance within a statement. Specifically we had some conversations about the best way to model reference statements. This conversation settled on using a
Stated In property versus
Ref URL to point to a reference. These kinds of discussions indicated a depth of knowledge about modeling that new editors may not necessarily possess. Similarly, editors enjoyed using tools like Recoin (Relative Completeness Indicator), which recommends commonly used properties. They were also quick to point out how it is also a limiting tool and having an understanding of data models and common properties ultimately better describes statements. Adding
Stated In references often entailed creating the item for that particular reference, which is time consuming and distracted some participants from the edits they wanted to make, but they understood the value in this approach.
We introduced several different tools to participants. One tool, TABerancle, displays editable Query Service results as an editable table. This tool more than others can reveal inconsistencies in multiple values for properties that only take one value, property usage, language labels, and missing values. This tool allowed participants to focus on a small set of properties and add or improve values across several items.
It was exciting to see participants from many different professions find common ground on data quality. They understood the importance of references, consulting with the community, using qualifiers, and knowing where resources exist to discern correct property usage. Beyond these observations and common sense conjecture (i.e. more items means better representation), it would be helpful to develop more tools to measure data quality both for items and for participants.
Impact on program participants
At the end of the six week session, there were many conversations about how to continue using Wikidata. There was some interest in a few institutions (University of Toronto and Wittenburg University) exploring the creation of their own Wikibase at their libraries. Others cited an interest in continuing to weave regularly editing Wikidata into their everyday responsibilities.
The courses pulled heavily from cataloging and metadata departments at libraries. This created the opportunity for potential future partnerships across campuses using Wikidata. We encouraged institutions to send multiple members to ensure there would be enough local knowledge to continue editing after the end of the course. Of all survey respondents, only two expressed a lack of interest in continuing to edit Wikidata.
A few participants expressed interest in incorporating Wikidata into their graduate courses. The believed that Wikidata is an ideal teaching environment for concepts and theory around metadata production, linked data, and presenting information in a way that's new to libraries.
Retention is always on our minds with programs like these. It's been hearten to see a few participants have edited since the end of the course and we hope to see this trend sustained as time goes by. We will be checking in with participants six months from the end of the course to see if they are continuing to edit Wikidata.
We asked all program participants to take a comprehensive survey (totaling 48 questions) to help us better understand what worked with this course and what we could change for the future. Thirteen participants submitted survey results to us. This survey contained questions about the usefulness of our training modules, relevance of our curriculum, length of the course, technology used, cost, and meeting overall expectations with this course. In addition to finding our course informative, all respondents found Wikidata to be a useful, compelling platform for data literacy, and to a lesser extent, their collections. All but one indicated an interest in continuing to edit.
We received positive feedback about the construction of the course — using the Dashboard, taking trainings, and the length of the course as well as the size of the sessions. Participants appreciated being able to participate in discussions and have their questions answered in person. There was similar feedback about the number of assignments and expectations for contributions. This resulted in positive feedback about the pacing of the course. There were a few participants who wished the course was longer, but the majority found it to be a manageable length.
Some participants left some constructive feedback about additional topics we could cover — batch uploads, additional tools, sharing project-oriented workflows — all of which are excellent recommendations for future training modules or curriculum points. Participants also provided feedback about our training modules, pointing out some typos, formatting issues, and shared pedagogical recommendations.
Survey results revealed the value that participants found in insights and opinions from their peers. Alignment around positions and responsibilities in these courses was significant, which allowed for specific questions requiring specific knowledge to actually be answered.
Having this project make so much sense to librarians was some of the most exciting feedback. We received similar feedback from participants in other industries which is something the entire Wikidata community should know about and take great pride in.
One aspect of our theory of change is to monitor participant engagement and retention on Wikidata. As of the publication of this report, only a month and a half has passed since the courses and workshop ended. Of the 38 participants, we have had ten edit Wikidata after the end of the course and workshop.
- Four from the one day workshop, a few edits each
- Two from the beginners course, hundreds of edits
- Four from the intermediate course, hundreds of edits
Of this group, only three have edited consistently. It will be interesting to see of other participants start editing or edit with more frequency. We will follow up with small questionnaires about Wikidata usage six months after the course end-date for those who have opted in to future communications in our end-of-course survey.
Returning to key questions
- To what extent will information professionals address incomplete data on Wikidata?
A: We found that information specialists with no prior Wikidata trainings were able to learn and start to contribute to Wikidata with a strong level of accuracy. Evaluating this through adding references demonstrates that courses like this can provide hundreds of references to support statements on Wikidata. Although the level of engagement varied, all participants made multiple edits to Wikidata - creating statements, adding qualifiers, and creating properties one case. Monitoring these metrics with an established baseline and a set of goals and milestones will help us better measure impact in the future. The focus on libraries and librarians has been especially effective regarding access to high quality data and how to describe it.
- How will information professionals improve incomplete and possibly inaccurate data models and ontologies?
A: This first round of Wikidata training began to scratch the surface of this issue. We had property proposals, project chat conversations, and spirited discussions in our meetings about data models. Although the impact of these additions is not immediately quantifiable, engaging directly with Wikidata's ontology via property creation shows how quickly subject experts can share their knowledge with the Wikidata community.
- Will participants be able to contribute data or entire datasets that aren't yet present on Wikidata?
A: These courses worked well to frame the data donation process to Wikidata. We did not have any participant add a whole data set. A data donation course will either take more time or participants who have identified a dataset in advance of the course. The contributions indicated that participants felt comfortable editing in their areas of expertise with a high level of detail (i.e. well-formed references and qualifiers). Following up with participants will helps answer this key question more definitively.
- Will participants find Wikidata a compelling enough platform to use, in some capacity, at their institution? (both while taking the course or after the course ends)?
A: Yes, on an individual level, but less so at the institutional level. Participants remained enthusiastic about using Wikidata based on survey results. Our greatest institutional success in this course came from participants representing the Frick created a property for items in their galley. This property did not exist prior to this course and now nearly 100 item use this property to point to an external collection. Other participants expressed institutional interest in the future, starting a Wikibase at two institutions for instance.
- Can we retain subject matter experts; will they remain active?
A: Yes, for now. As of this publication we have had 10 participants continue to edit in some capacity after the end of the course. Three of these 10 have edited consistently since the courses and workshop finished.
- Is this program worth continuing?
A: This Wikidata courses as we have designed them are a staff-intensive endeavor. We believe that the personal attention this course and workshop provide is necessary to orient newcomers to Wikidata in a short amount of time. As such it requires revenue to continue it. Although we did not meet our revenue goal, we did meet our participant goal. Revenue aside, the amount of edits these 38 participants made, along with the quality of these edits, made a strong case to continue to pursue this program. There is a lot we still need to test and figure our regarding our Wikidata courses, but we believe there is interest from information specialists and other subject experts to invest in linked data. We believe that Wikidata can have a positive impact on so many disciplines and industries. Although this program is resource-intensive, we believe that it is a program is worth continuing. We look forward building off of what we have started and improving it until it becomes a more indispensable resource.
Adapting the program
This is a new course offering within the Scholars & Scientists Program. There is a lot we are looking forward to tweaking, improving, and changing about it moving forward. Here is a preliminary list of ideas for future courses:
- Building out our curriculum to cover more topics and meet a more diverse set of needs in tailored courses. We think linked data's underlying principles will benefit other disciplines and/or industries. Most important, to improve representation on Wikidata we need other subject matter experts to embrace linked data and share their data on Wikidata and tailored courses could help address this need.
- Structuring course around a specific dataset: This would help us achieve two things: 1) establish a well-documented workflow about uploading a large amount of data to Wikidata and 2) help us better understand data quality and completeness. It would be a different curricular approach than the one we have currently, but it would be example-based and could be especially appealing to an institution or specific department to work on together.
- This program could fit into a classroom environment. Tweaking the assignments and modules to cater to students within a course would be worth exploring.
- Reviewing the trainings and curriculum to include more tools and software.
- Test expanding the length of this program from six to eight weeks. We have received feedback from some participants that a longer course would be helpful in covering more concepts and allow for more practice.
- Construct more specific assignments: modeling whole information domains, property creation project, adding statements for the most important yet lowest used properties, creating specific lists, using the Wikidata Bridge to connect Wikipedia to Wikidata
- Consider Wikibase stewardship: create a course to support Wikibase instances. The skillset needed to run a Wikibase overlaps with our Wikidata curriculum, but would require additional skills.
- Evaluate a way to standardize and differentiate the learning outcomes of the beginner and intermediate course. Since we do not assume prior experience with Wikidata, our training modules overlap in both of these courses. This may not always be necessary and exploring this would help determine what a more individually distinct set of offerings would look like.
After this course, we have seen firsthand the enthusiasm and curiosity that information professionals have in Wikidata. We believe that they bring a needed set of skills, perspectives, and connections to data that make them an indispensable part of Wikidata. Through teaching them how to edit and become part of the community we believe this program can benefit both Wikidata and libraries' local data. It is our hope that closely connecting librarians to Wikidata will help achieve a more complete and representative set of data on Wikidata. We support this program and will continue to offer it.
We would be interested in offering an advanced course at some point. Wikidata projects at an advanced level require different skill sets. We want to be sure that there is enough interest to run a course and that our staff's ability to answer these diverse, high-level questions is present. Lastly we want to know that this style of program is appropriate for a high-level course. It may be that a consultant-style approach or a more-embedded approach would be required for this level of engagement.
The Wikidata courses within the Scholars and Scientists Program has merit for several reasons. The program opens Wikidata up to a set of talented information professionals. Not only do these professionals bring their expertise with them, but they are also often stewards of data. Connecting Wikidata to these professionals benefits both Wikidata and local collections, which can enrich their data with Wikidata. This also benefits the Wikimedia community by having more structured, language-independent data from which to connect articles across all languages. We started this program with libraries because of the natural overlap with linked data and collections, but this program could easily reach beyond libraries into museums, galleries, archives, civic data, and numerous university courses.
Coming from libraries, these experts have several years of experience modeling, evaluating, and working with data. Wikidata presents a unique opportunity as it is inherently collaborative and a creation environment for linked data — something that is just arriving in library systems right now. The timing could not be better as interest in Wikidata as project continues to swell. This expertise will also influence the current practices on Wikidata. Similarly Wikidata practices may begin to influence libraries as well. As data is centralized across institutions, queries will become more powerful, revealing new insights we have never had the opportunity to know until now.
Wikidata holds the potential to do so many things. As the backbone for identifiers online, Wikidata acts as a verifiability tool. Since provenance is built into every claim that editors make on Wikidata, this sense of verifiability extends uniformly throughout Wikidata. Again, libraries are natural fits based on alignment around promoting verifiable information for all patrons and users. By introducing new editors how to contribute specifically to this, these courses contribute to improving the data on Wikidata.
Wikidata is still a new project. The curiosity and passion that these course participants brought underscores the fact that we do not yet know the full potential of what Wikidata can do. We had numerous conversations about potential tools, potential case studies, and hypothetical questions about being able to query the world that become a little less hypothetical with every edit.
This first round of participants demonstrated several points mentioned in the Theory of Change. Non-Wikidatans were able to deliberately contribute to Wikidata in their area of expertise. Participants were able to apply their expertise to Wikidata policy, community norms, and tools to contribute to data models, evaluate ontologies, and address content gaps in Wikidata. In contributing to already-existing items and creating items from scratch, these participants are improving Wikidata's language independence, machine readability, and connecting Wikidata to outside resources hundreds of edits at a time.
We will be offering new courses, starting in September and October 2019. We hope to continue offering more in 2020. These courses will follow the formula of the first round of courses we offered — two beginner sessions and one intermediate.
One last potential opportunity Wikidata can help achieve: the ability to more equitably produce information online. We strive to have a diverse base of editors on Wikidata and programs like these can begin to help us toward that. There's lots of work to be done, but creating a structured set of courses begins to create opportunities for Wikidatans who never knew they were Wikidatans until they started the course. Although there is a fee associated with the course, we hope to build it to a sustainable level where we can offer it to anyone who wants to take it. All trainings remain freely licensed and open to anyone interested in learning more about Wikidata. We want to ensure that the editor base of Wikidata is as diverse as the knowledge it represents. Bringing in more data, from more institutions, edited by more people is the way to achieve this. We look forward to growing this program in new directions, taking on new editors, taking in new data, and working toward realizing the full potential of Wikidata.