Abstract Wikipedia/Updates/2021-04-29

◀

Abstract Wikipedia Updates

▶

Lexicographical coverage, a first function call evaluated, and more.

This week, I want to start with a shoutout to our phenomenal volunteers.

Lexicographical coverage

[edit]

My thanks to Nikki and their updates on the dashboards about lexicographical coverage. Since the first publication of the dashboard, Nikki has kept the dashboards up to date, re-running them from time to time and updating the page on Wikidata. They and others have also fixed numerous issues, created more actionable lists, and added more languages based on other corpora than Wikipedia (most notably from the Leipzig Corpora Collection). Thanks also to Mahir, who also contributed to the dashboard, particularly covering Bengali, one of our focus languages.

In fact, thanks to Nikki and Mahir, the four main focus languages are now all covered: we have numbers for Bengali, Malayalam, Hausa, and Igbo. We are still missing our stretch focus language, Dagbani, because we could not find yet a corpus. We have reached out to a researcher who has compiled a Dagbani corpus, and we also are exploring how we could use the Dagbani Wikipedia on Incubator. In the meantime, we are pleased to see that the Dagbani community has put in a request for a new Wikipedia edition and that they feel that they are ready to graduate from incubator! Congratulations!

Some of the results of highlighting the dashboard, and particularly the list of most frequent missing lexemes, were very promising: coverage in a number of languages has increased considerably. To just list a few examples: Polish went from 16% to 32% coverage, German from 53% to 67%, Czech from 44% to 57% — and Hindi went from a mere 1% to 15%, and Malay from 15% to an astonishing 53%! Congratulations to those communities and others for such visible progress.

With an eye on our focus languages, Bengali went from 18% to 28%, Malayalam is at 21%, whereas Hausa and Igbo both have coverages of below 1%.

Another great tool to see the progress in lexicographical knowledge coverage in Wikidata is Ordia, developed by Finn Årup Nielsen. Ordia is a holistic user experience that allows users to browse and slice and dice the lexicographic data in Wikidata in real time. We can take a look at the 11,400 Malayalam lexemes, the 8,724 Bengali lexemes, 53 Dagbani lexemes, 15 Hausa lexemes, and the single lexeme in Igbo, mmiri, the Igbo word for water. Thanks to Finn for Ordia!

Making the state of the lexicographical coverage visible shows us that there is still a lot to do — but also that we are already achieving noticeable progress! Thanks to everyone contributing.

By the way, the annotation wiki is currently having issues. If you would like to help us with running it and have experience with Vagrant and Cloud VPS based wikis, please drop me a line on my talk page.

A first running function call!

[edit]

Lucas Werkmeister consistently keeps being amazing. He is working on GraalEneyj, a GraalVM-based evaluation engine for Wikifunctions, written in Java. Lucas re-wrote GraalEneyj to be able to call a function all directly from the notwikilambda test-wiki — the very first time that one of our functions is being evaluated! You can watch that moment in a Twitch video.

We are still working on replicating that feat in what will be our production codebase, and hope to soon connect our backend evaluating functions with the wiki — this is our goal for the ongoing Phase δ (delta). Congratulations to Lucas for achieving this step!

Delay on logo

[edit]

There will be a delay on the logo finalization. Please expect another month or two before we will have news to share about the logo. Due to the legal nature of some of the involved issues, we have decided to not share details in public. Sorry for the delay, and I am looking forward to sharing the next steps in this process.

New documents

[edit]

We have been working for a while with the Wikimedia Architecture Team on a number of artefacts around Abstract Wikipedia and Wikifunctions. We have now published and shared these documents in the Architecture repository (on the MediaWiki wiki). We are aiming to keep publishing our design documents and related development artefacts, and are happy to invite you to this set of documents.

Based on requests from the community, we also worked on a new example of an article in abstract content. The example is not complete, and is open to being edited and discussed. Note that this is not meant to be prescriptive of how abstract content should look like, but merely a more concrete hypothetical example of what it could look like. I am confident that the community as a whole will come up with better abstractions than I did. Please do edit or fork that page.

There will be three approaches towards creating an implementation for a function in Wikifunctions, and the current and following two phases of development are each dedicated to one of those approaches: (1) allow to call a built-in implementation in the evaluator engine, (2) allow to call native code in a programming language, and (3) compose other functions to implement a new function. In preparation for the upcoming Phase ζ (zeta), we have created a few examples of function composition.