Content Partnerships Hub/Helpdesk/Malayalam lexemes
Content Partnerships Hub
Improving the Wikimedia movement’s work with content partners
Malayalam lexemes
Description
[edit]
Malayalam is a Dravidian language spoken by more than 35 million people, with a diverse lexicon and complex morphology. Wikidata’s Lexeme namespace offers a way to model, preserve and reuse this linguistic data, but the current material available is of uneven quality. Of the 67 000 Malayalam lexemes present, many lack proper definitions or structured information.
Request
[edit]The Helpdesk was approached by members of the Malayalam-speaking community, who had access to a large collection of words they wished to upload to Wikidata. The resource was available under an open license, compatible with Wikidata. They were interested in using the Helpdesk's expertise with bulk-editing Wikidata to achieve their goal of enriching lexicographical data in Malayalam.
Process and switch of focus
[edit]The Helpdesk staff communicated with the affected community members in order to better understand both their needs and the data. In this process, it became clear that the technical expertise of the Helpdesk would not be of much use to the requestors.
The reason for that is that working with the provided data required knowledge of the Malayalam language, something nobody in the Helpdesk has. Without it, it would be impossible to assess the quality of the data (such as the definitions of the words), assure no duplicates are created, correctly tag the grammatical forms, as well as assess the data that already is in Wikidata and identify areas that are in most need of improvement. To put it bluntly, if a person who has no knowledge of the language were to bulk-upload the data, they could potentially cause damage.
In our conversations, we answered questions about how the lexicographical namespace in Wikidata works and what the workflows for working with it could look like. Together, we came to the conclusion that the most effective cause of action would be to engage the Malayalam-speaking community to work on the data together. However, the community is quite small, which is a major problem. The available documentation is quite complex, making it hard for complete beginners to know where they should start.
The suggested way of action for the Helpdesk was thus to create a beginner-level guide to editing Wikidata lexemes.
Next steps
[edit]Currently, the project already started with the edition of lexems, and is also assessing data quality and mobilizing the community. The next steps include identifying the most suitable tools, developing pilot cleanup activities, and continuing to receive guidance from the Expert Committee.
Documentation and learning
[edit]Get involved
[edit]If you are interested in supporting the Malayalam Lexemes project, you are welcome and can help by reviewing and improving existing lexemes, contributing your linguistic expertise, or experimenting with tools that make the cleanup process more efficient.
If you would like to take part, talk with us so we can integrate you into the workflow and connect you with the community members already involved.
Requestor
[edit]The request was submitted by an user through the Content Partnerships Hub Helpdesk, with the goal of improving data quality and building long-term community capacity.




