Jump to content

Grants:Programs/Wikimedia Community Fund/Rapid Fund/Using Award Anthology Short Stories Data for Advanced Editor Training (ID: 22763030)/Final Report

From Meta, a Wikimedia project coordination wiki
phibeatrice
Using Award Anthology Short Stories Data for Advanced Editor Training
31 October 2024 - 01 March 2025
Report ID: 11437
Report status: Under review
Report due date: 31 March 2025
Grant ID: G-RF-2407-16588
Amount funded: 3000 USD, 3000 USD
Amount spent: 3000 USD
Rapid Fund Final Report

Application type: Standard application

Part 1: Project and impact

1. Describe the implemented activities and results achieved. Additionally, share which approaches were most effective in supporting you to achieve the results. (required)

After receiving the grant, starting in October, User:Phibeatrice began experimenting with tables on literature-related pages in order to develop a proof of concept for table utility and versatility in the context of the grant project. Specifically, they wanted to play around with tables and see what the best way to format them—or the different cases in which they could be formatted—so that they could know enough to teach other editors or provide them useful case studies of how a table could be used in such a context. As a result, User:Phibeatrice created over a hundred pages related to books and literature in the span of a few months, with many of them relying on table functionalities to represent data such as tables of contents, publication credits, and other useful encyclopedic information. Some great examples will be placed in the second question on documentation.

After User:Phibeatrice gained significant fluency in tables and discussed with User:Jenny8lee on the strength of the tables they had created, there was now a methodology in place to create efficient tables for book-related pages and thus the knowledge needed to create a template for other editors to practice with. In the meantime, the challenge was getting access to old editions of The Best American Short Stories and putting in the time and effort to convert physical, printed tables of contents into formatted data fit for use on Wikipedia. The budget was helpful in compensating for the labor needed to find the copies, scan them, try a litany of methods to most effectively extract text from the scans using OCR and LLMs, and then format them into spreadsheets that our editors could use to effectively copy from when making articles. The last step is cleaning the OCR to make sure the data matched with the original scan (for example where “d” is rendered as “cl”). This data wrangling, which we had accounted for in our project planning, took the most amount of time.

While our data wrangling was coming to a conclusion, we got into contact with several editors we knew, from varying skill levels, to train them on how to use tables. User:Phibeatrice developed their own curriculum, documents, and ways of conveying the information. They were able to use several of their own newly created pages as case studies on the effectiveness of table functionality and even create some pages with tables, rather quickly, from scratch in live Zoom demonstrations to show these editors in the most interactive way possible how tables could be used. They also were adept at using ChatGPT in order to take large swathes of unorganized information and have the LLM format it into Wiki language that could then be easily plugged into a Wiki page—of course, some human checking and cleaning was needed. All of this knowledge was then crystallized into training for many editors in our orbit. Thus, in addition to the results of User:Phibeatrice making a lot of pages and, later on, of other users making new pages for The Best American Short Stories volumes, there was also the result of User:Phibeatrice and User:Jenny8lee gaining expertise in how to motivate and teach editors on certain features of the Wikipedia platform.

As always, we find that having a diverse number of training methods works best. Some people worked better with more live, interactive trainings, so we trained them on tables over Zoom. Other editors were more experienced and were able to get by with a Google Doc we had created to illustrate tables. A few were able to get the hang of tables by creating them on their own pages or finding a page to create them on, after which we reviewed their work. Moving forward, we hope to continue working with editors of various experience levels and using a diversity of approaches to help them improve their contributions to Wikipedia in the way that they learn best.

2. Documentation of your impact. Please use space below to share links that help tell your story, impact, and evaluation. (required)

Share links to:

  • Project page on Meta-Wiki or any other Wikimedia project
  • Dashboards and tools that you used to track contributions
  • Some photos or videos from your event. Remember to share access.

You can also share links to:

  • Important social media posts
  • Surveys and their results
  • Infographics and sound files
  • Examples of content edited on Wikimedia projects

Mostly from October to January, User:Phibeatrice was prolific in creating over a hundred literature-related pages on English Wikipedia. Many of them involved experimenting with tables. Below are highlights of their work which were later used to train editors with or otherwise provide documentation and examples of table functionality.

One of User:Phibeatrice’s first table pages was Charlie Chan Is Dead: An Anthology of Contemporary Asian American Fiction, an anthology with over 40 short stories and 40 authors. In addition to that data, there was also the additional data of listing original publication credits. With the sheer scale of data needing wrangling, User:Phibeatrice learned a lot about how to effectively pull data from physical books and format them accordingly on a Wikipedia page. This knowledge would form the bedrock of the teaching to be done much later.

User:Phibeatrice also made several short story collections in which a table of contents, with publication credits, could be effectively conveyed with tables.Wednesday’s Child became used in our official documentation to show various features of how to create and manipulate a table.Tomb Sweeping andGreen Frog are other examples of this. Some more detailed tables of contents, with sections and additional columns of information, were tackled in the pages The Penguin Book of Japanese Short Stories and The Penguin Book of Korean Short Stories.

With their proficiency in other languages, User:Phibeatrice also began creating bilingual tables, such as in the case of Sayaka Murata’s short story collection Life Ceremony.

Drawing inspiration from the table of publications on Ocean Vuong’s page, User:Phibeatrice created a similar table for the poet Emily Jungmin Yoon’s page which required more columns to indicate publishers, ISBNs, references, and other relevant data. With this knowledge, User:Phibeatrice became a frequent contributor to certain author pages’ tables whenever new publications were released, i.e. when recent Nobel Laureate Han Kang released a new short story in The New Yorker which needed indexing on her table.

With User:Phibeatrice’s track record of creating tables for literature-related pages, all of the groundwork for training other editors on tables was done. Due to delays in acquiring archival materials from old editions of The Best American Short Stories volumes, we weren’t able to start the edit-a-thon to create new pages, with full table data, until the end of March. (This was the source of the significant delay in our activities and therefore our reporting, and we express our apologies for that once again.) Beforehand, however, we were able to train editors on tables in other ways, using the wealth of examples we had created in the months prior to demonstrate tables, provide opportunities for others to work with them, and otherwise develop a strong proof of concept for the edit-a-thon to come.

Once we got our hands on the full table data, we were enormously fruitful with it. According to data from our event dashboard which we created for the first half of April alone, we were able to create over 50 pages—most of them Best American Short Stories, some of them being other short story collections—in about a week with 5 editors trained on tables and other parts of the article creation process. A more specific breakdown of our outcomes can be seen here: https://outreachdashboard.wmflabs.org/courses/Writing_Downtown/Using_Award_Anthology_Short_Stories_Data_for_Advanced_Editor_Training/home#

Outside of the scope of the project but still relevant to tables, User:Phibeatrice also took a brief detour out of literature-related pages and instead created yearly hubs for the Nobel Prizes, with over 30 year-based pages created in total, using the knowledge they had gathered from their research for this grant project. An example can be seen at 2022 Nobel Prizes.

Additionally, share the materials and resources that you used in the implementation of your project. (required)

For example:

  • Training materials and guides
  • Presentations and slides
  • Work processes and plans
  • Any other materials your team has created or adapted and can be shared with others

User:Phibeatrice created a Google Doc for some editors to use in order to gain fluency with tables: https://docs.google.com/document/d/1R6dXoXH4pIod4BAbFSISIq6aY3n0NiD6n-SCv4YCM5M/edit?usp=sharing

Many of the pages listed above, in the second question, were frequently passed onto editors as examples of how tables could be used. In addition to serving as useful pages unto themselves, they became a source of training for the editors we came into contact with from October onward.

3. To what extent do you agree with the following statements regarding the work carried out with this Rapid Fund? You can choose “not applicable” if your work does not relate to these goals. Required. Select one option per question. (required)

Our efforts during the Fund period have helped to...
A. Bring in participants from underrepresented groups Strongly agree
B. Create a more inclusive and connected culture in our community Strongly agree
C. Develop content about underrepresented topics/groups Strongly agree
D. Develop content from underrepresented perspectives Strongly agree
E. Encourage the retention of editors Agree
F. Encourage the retention of organizers Agree
G. Increased participants' feelings of belonging and connection to the movement Agree
F. Other (optional)

Part 2: Learning

4. In your application, you outlined some learning questions. What did you learn from these learning questions when you implemented your project? How do you hope to use this learnings in the future? You can recall these learning questions below. (required)

You can recall these learning questions below: In addition to teaching more advanced editors, we would layer the table creation skill on to other training that we are already doing for female and diverse editors.

We believe that organizations with a strong identity centered on diverse writers are an ideal pool for Wikipedia to recruit from since they both are comfortable with prose and have a mission of elevating underrepresented voices.

While editathons are good for “top of the funnel” recruiting, these don’t always turn into long-term engagement. We believe that the mission-oriented nature of the diverse arts and letters organizations can create a strong enough bond so that the participants reinforce involvement in Wikipedia.

The advanced model for training looks like this:

1) “Training.” This is a “come learn about Wikipedia and eat yummy lunch.” This is largely a social experience and we help people who are already Wikicurious. 2) Then the bulk of the training is remote in groups on Zoom. Interested people sign up for a few sessions to become proficient on editing/formatting tables and cleaning/uploading data (we have another grant that helps with this). 3) Everyone has to produce an edited article and also add a table. 4) We have an “editathon” which again is people eating yummy food, but is mostly a social experience, where they may add links or new article drafts based on the added tables.

Note: this format may change depending on what the partners feel is most conducive for engagement.

Ultimately, we found that our in-person sessions and Zoom sessions, led by User:Phibeatrice, were pretty effective in not just teaching tables but also getting editors motivated to continue editing Wikipedia. These other editors would not just make tables but also find other avenues of engagement with the platform. Some wanted to do page creations. Others were sticking to edits. Many people like adding or improving citations. There are folks who got into Wikimedia Commons, Wikidata, etc. We’re still reflecting on how to keep this motivation going to ensure that Wikipedia editors continue to work beyond us and develop a sense of commitment to contribution outside of our grant project.

Since User:Phibeatrice was mostly concentrating on minority literature, we were thinking a lot about diversity and representation as ongoing goals for our community and also the platform as a whole. Along the way, we got in touch with the Asian American Writers’ Workshop and helped them write a grant on restarting edit-a-thons in New York City to improve articles related to Asian American literature. We hope, in the future, that we can incorporate our lessons on tables into their programs, although User:Phibeatrice has already created over a hundred pages related to Asian American literature, many of which involve tables. We also hope to learn more about how to convert those edit-a-thon attendees, or WikiCurious individuals in general, to consistent editors rather than fleeting participants.

As we went through the grant project, we found ourselves imagining not just how this table training would go but also what we were learning in the context of how to train people in general. For instance, we thought about how editors who were interested in other niches of knowledge, rather than literature, could use tables for their own purposes. We know that not everyone is interested in literature, but everyone has some interest of some sort, and we believe that every interest is probably going to benefit from the usage of tables in some way. It’s such a versatile and effective way of visually conveying information on Wikipedia that it should be pretty easy to help others imagine different use cases for them and teach them how to implement those use cases accordingly.

5. Did anything unexpected or surprising happen when implementing your activities? This can include both positive and negative situations. What did you learn from those experiences? (required)

We were surprised by how laborious the time and effort, off-Wiki, of getting The Best American Short Stories data was. Looking back at it in hindsight, we could have better coordinated with our team members, established firmer deadlines, and found better methods of extracting the data and preparing it for our editors to work with. However, it did teach us a lot about OCR, LLMs, and also the cross-applicability between spreadsheet software and Wiki tables, which has been useful for us in importing table of contents data from new books and creating new pages with them.

For instance, User:Phibeatrice recently created the page The Best American Short Stories of the Century, which hadn’t previously been included in our data wrangling, by reflecting on the lessons of our experience and applying the best strategies possible to see how fast they could pull data from a table of contents with over 40 short stories in it. They also created 100 Years of the Best American Short Stories while other editors were working on the yearly volumes of The Best American Short Stories.

Another surprise was that although The Best American Short Stories goes all the way back to 1915, we weren’t really able to get complete-enough data for the volumes before 1941. As a result, we had to end our page creations at the 1941 volume of the series. Perhaps this could be fixed in the future with more precise data wrangling, but for the time being, we are pretty happy with having created all of the pages from 1941 to 1985. (Before the grant project began, the oldest volume with a page on Wikipedia was 1986.) We also compensated for this by creating The Best American Short Stories of the Century and 100 Years of the Best American Short Stories, as well as other short story anthologies and collections, in April.

6. What is your plan to share your project learnings and results with other community members? If you have already done it, describe how. (required)

With our documentation established, it will be very easy for us to teach tables to editors we meet in the future. Already, during our last and most fruitful week for contributions, we ran into very new editors whom we plan to use our curriculum to teach tables to. User:Phibeatrice is also eager to hop into Zooms to show people not just how to create tables but also push other kinds of edits and also create pages.

Additionally, we’re aware that tables have enormous potential outside of literature-related pages. For instance, one of our editors has been excited about creating pages related to musicians, and in their experience, tables have been useful for discographies. In the future, we intend to not only help advance the usage of tables in literature-related pages but also promote their use in every possible way they can be useful for the purposes of representing information on Wikipedia.

Still, with regard to literature, there are enormous gaps that we’d like to address in the future with our editors should there be the capacity. Firstly, The Best American Short Stories is a series that is getting published every year still. We hope to call upon our editors to make new pages for each of these volumes as they get released. Secondly, the other important anthology in American short stories is the yearly compilations of O. Henry Prize short stories. These could also benefit from page creations using tables. While it would require a significant amount of labor to wrangle the data for them, it is something that we are dreaming about in terms of how to augment this project even more.

Part 3: Metrics

7. Wikimedia Metrics results. (required)

In your application, you set some Wikimedia targets in numbers (Wikimedia metrics). In this section, you will describe the achieved results and provide links to the tools used.

Target Results Comments and tools used
Number of participants 15 20 5 users committed in April to creating pages Best American Short Stories volumes or otherwise editing tables. Beforehand, we had trained several other users at in-person Wikipedia Day and WikiCurious events as well as on Zooms.
Number of editors 15 20 For our grant project, participants and editors are synonymous.
Number of organizers 4 4 In addition to Bea Nguyen and Jennifer 8. Lee, we had help from Sara Komatsu and Heidi Pitlor on getting access to old editions of The Best American Short Stories. These people thus helped us organize the data wrangling effort, off-Wiki, to prepare the data for our Wiki editors to use.
Wikimedia project Target Result - Number of created pages Result - Number of improved pages
Wikipedia 600 75 100
Wikimedia Commons
Wikidata
Wiktionary
Wikisource
Wikimedia Incubator
Translatewiki
MediaWiki
Wikiquote
Wikivoyage
Wikibooks
Wikiversity
Wikinews
Wikispecies
Wikifunctions or Abstract Wikipedia

8. Other Metrics results.

In your proposal, you could also set Other Metrics targets. Please describe the achieved results and provide links to the tools used if you set Other Metrics in your application.

Other Metrics name Metrics Description Target Result Tools and comments

9. Did you have any difficulties collecting data to measure your results? (required)

No

9.1. Please state what difficulties you had. How do you hope to overcome these challenges in the future? Do you have any recommendations for the Foundation to support you in addressing these challenges? (required)

Part 4: Financial reporting

[edit]

10. Please state the total amount spent in your local currency. (required)

3000

11. Please state the total amount spent in US dollars. (required)

3000

12. Report the funds spent in the currency of your fund. (required)

Provide the link to the financial report https://docs.google.com/spreadsheets/d/15wyNImcbWyT9t06lAECk7LHEojf-yTvaObMQnj24B7c/edit?usp=sharing


12.2. If you have not already done so in your financial spending report, please provide information on changes in the budget in relation to your original proposal. (optional)


13. Do you have any unspent funds from the Fund?

No

13.1. Please list the amount and currency you did not use and explain why.

N/A

13.2. What are you planning to do with the underspent funds?

N/A

13.3. Please provide details of hope to spend these funds.

N/A

14.1. Are you in compliance with the terms outlined in the fund agreement?

Yes

14.2. Are you in compliance with all applicable laws and regulations as outlined in the grant agreement?

Yes

14.3. Are you in compliance with provisions of the United States Internal Revenue Code (“Code”), and with relevant tax laws and regulations restricting the use of the Funds as outlined in the grant agreement? In summary, this is to confirm that the funds were used in alignment with the WMF mission and for charitable/nonprofit/educational purposes.

Yes

15. If you have additional recommendations or reflections that don’t fit into the above sections, please write them here. (optional)


Review notes

[edit]

Review notes from Program Officer:

N/A

Applicant's response to the review feedback.

N/A