There are 3.5 million categories on the Wikimedia Commons
The Piłsudski Institute of America GLAM-Wiki Scalable Archive Project will develop and implement categorization practices that can be scaled, adapted and will help other GLAM projects grow. Categories on Wikimedia are currently underutilized by GLAM projects that would greatly benefit from them, such as document-based archives. Developing categorization practices will help other archive projects organize their digitized collections on Wikimedia and grow their open-source digital collections as more items are digitized. The Piłsudski Institute of America GLAM-Wiki Project will be the test-bed for deeper, meaningful, implementation of Wikimedia categories.
More categories, more context
Archives using a structured data source (such as an archive using the EAD metadata standard) will be able to generate several levels of categorization for their Wikimedia Commons collections. We think that implementing intermediate categories that give context to the documents presented in our collections will make GLAM projects more attractive for Wiki users. Generating these categories from a data source structured to current broadly implemented standards will make the process of creating a GLAM collection on Wikimedia easier.
While there are millions of categories available on Wikimedia, the vast majority of current Wikimedia Commons collections have a broad category, such as “paintings,” followed by an alphabetical listing of content. There is no automatic solution for categorization either during or after a batch upload, but we would at least like to develop ways to ease the linking of stand-alone category trees (drawn from the archive) to the millions of categories already available on the Wikimedia Commons. Intermediate categorization based on already available metadata would allow an archive to contextualize their work in ways that are easier to navigate for the Commons user.
Making it easier to link stand-alone categorization to existing categories would benefit both Wikimedia and GLAM projects. Rather than an alphabetical listing, a collection of paintings would be grouped by era, style, or artist, depending on what information the metadata contains and what Wikimedia categories exist. Individual items could appear in multiple categories depending on how they were tagged in the metadata, utilizing a key feature of the Wiki format.
To help GLAM projects harness the power of the Wikimedia Commons and make them better to browse for users.
Our script is thus far able to create a category tree by mimicking fonds and folder hierarchy used by the Piłsudski Institute archive. The results of running the script can be seen here:
These “fonds” are automatically added to files by the [Template:Piłsudski Institute document], while category description is created by [Template:Józef Piłsudski Institute of America Category Description] and [Module:Józef Piłsudski Institute of America Template:Józef Piłsudski Institute of America Category Description]. The “Accession number” field in [Template:Piłsudski Institute document], adds links to those new categories.
We hope to utilize not only folder numbers but also titles and document tags to enable better browsing. This will be generated from metadata provided it is available and desired. Collections that utilize this information will be easier to create and become friendlier data sources to browse.
We would like to make connecting stand-alone categories created from metadata to already existing Wikimedia categories a less manual process. The more experience we gain by experimenting with our own Wikimedia Commons collections, the more streamlined we can make the process for others.
Documenting our findings is the only way that other projects will be able to use our work and we hope to be thorough in providing it. Ultimately we would like to have contributed something that other projects will want to adopt.
Our project will serve as another place where categorization can be discussed, with GLAM projects specifically in mind (but not exclusively).
|Create knowledge base of categorization practices based on Wiki and academic articles.||Feb. 1, 2015|
|Metadata to stand-alone category code and documentation ready for use by other projects.||April 1, 2015|
|Midpoint Report complete||April 15, 2015|
|Progress made toward creating better stand-alone to Wiki Commons category workflow||June 1, 2015|
|Final Report complete||July 1, 2015|
We have already developed code to transfer our archival categories to our Wikimedia Commons collection. At the same time as code is developed we want to focus on creating sensible categorization practices. We will:
- Implement the code for our whole Wikimedia Commons collection.
- Continue uploading original documents to the Wikimedia Commons using the script we have developed.
- Gather currently available categorization documentation, both Wiki and non-wiki based.
- Improve workflow with the idea that continual human interaction with uploaded content is critical to curating a digital collection.
- Improve upon the script and code as we gain experience with it.
- Create documentation so that other projects can benefit from our experience.
We assume the following improvements will be necessary as we work with the current code:
- Make the code easy to implement for others.
- Develop categorization meaningfully. That is: to give context to collections rather than make them more confusing.
Project manager dedicating 10 hours per week for the duration of the project, tasked with:
- Communicating with the coder and the Institute
- Balancing the needs of the GLAM project with the capabilities of the Wikimedia Commons
- Documenting the project during the grant period
- Creating best practices documentation based on code and workflow developed during the grant period
- Community engagement (described below)
6 months is understood as 26 weeks, for a total of 260 hours, at ~$30 per hour (based on median salary for digital archivist and similar jobs when broken down into a wage and considered as freelance work) is $8,000.
The Piłsudski Institute of America GLAM-Wiki project has a record of engaging a volunteers. It has thus far resulted in donating about 1,200 documents from the Institute archives to Wikimedia Commons, along with over 50 new Wikipedia articles in both English and Polish. We are working on this project because we would like to add documents to Wikimedia Commons faster and more meaningfully. With a working script, the institutional volunteers we regularly attract to our project will be more efficient in sharing our collection.
After creating a resource page that gathers links to Wiki and non-Wiki resources documenting categorization practices, we hope to create a discussion space where categorization can be collaboratively brainstormed and align our project with practices developed there. We hope that such a discussion might attract collaborators and make our work useful for a broad audience and not just ourselves. Using our work, we hope that other Wiki projects will be able to streamline their own uploading processes and contextualize their collections in a more meaningful way, thus hopefully make their projects more attractive for Wikimedia users and their own project volunteers.
- Wikipedia:WikiProject Categories
- Category:Wikipedia Categorization - relevant active category discussion pages on Wikipedia.
- GWToolset users
The project will continue as we continue to add documents from our collections. Other GLAM projects will be able to use our work and hopefully make the categorization process better as well.
Measures of success
- Upload 500 new files to Wikimedia Commons.
- Complete 50 Wiki articles based on files uploaded to our Wikimedia Commons collections.
- Engage 5 institutional volunteers for our project.
- Establish a collaborative relationship with a GLAM or other Wiki project.
- Outreach in the form of:
- Blog posts
- Open, online meetups (i.e.Google Hangouts) and information sessions to explain the project
- Contacting potential collaborators
- Helping other GLAM projects by:
- Improving documentation regarding categorization practices
- Making our code freely available at an online repository
- Creating a User's and instructions guide for the code and its implementation
Lukasz Chelminski is a doctoral candidate in the History department at the CUNY Graduate Center and an adjunct instructor at Brooklyn College. He is a proponent of digital pedagogy and has used web resources in the classroom extensively. Lukasz is the Wikipedian-in-residence at the Piłsudski Institute of America. Through his experience at the Institute he hopes to develop a pedagogy strategy which will introduce his students to Wiki-editing through collaborative class projects.
Volunteer Metadata Specialist
Marek Zielinski was educated in Łódź (Łódź Polytechnic and Polish Academy of Sciences) and in New York (New York University). A polymer scientist, in the US continued his career as software engineer in newspaper industry. Created the first ISP in Poland (PDi), published a "Solidarity" paper "Poglądy" and later a Polish-American e-paper "Pigulki". Since 2002 member of the Board and vice-President of the Pilsudski Institute of America, where he manages digitization of the 19th and 20th century Central European archives, including the data conversion and maintenance of metadata standards applications for the Institute projects (EAD, DC, MARC). A Wikipedian since 2008, he manages the Józef Piłsudski Institute of America WikiProject, participates in the Institute Commons partnership and Łódź University of Technology WikiProject.
Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?
- Wikimedia NYC (through "discuss" mailing list)
Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. (Other constructive feedback is welcome on the talk page of this proposal).
- OR drohowa (talk) 15:43, 29 September 2014 (UTC)
- Marek Zielinski - Categories are very important in GLAM partnerships. The project has a great potential to utilize the GLAM's own metadata in expanding and improving the categorization of uploaded files. 20:06, 29 September 2014 (UTC)
- Categories in GLAM greatly facilitate research for scholars such as myself. As a doctoral candidate in European history at University of Wisconsin-Madison, I can better organize my sources if my access to the materials at such institutions as the Piłsudski Institute is faster and easier. Piotr Puchalski (talk) 03:16, 30 September 2014 (UTC)