AI Sauna/Speaker notes - Kaj Arnö, Robert Silén - AI and Databases for Wikipedia

Speaker notes from AI Sauna. Recording starts at 1:50:04 in streaming video of AI Sauna Inspire Talks.

Harnessing AI and Database for Wikipedia[edit]

Intro, Kaj Arnö[edit]

Roles: We are wearing several hats, the main two being Maria and Fredrika:

MariaDB is the database that Wikipedia runs on – my daytime job
Projekt Fredrika is a Wikipedia project focusing on the representation of Swedish Finland in Wikipedia, in Swedish and other languages – Robert's daytime job and my pro bono job

We have four topics on Gen AI, or specifically, Large Language Models (ChatGPT, Claude, Llama 2 or so).

First, I'll present ten conclusions about how to use AI to improve the quality of Wikipedia as an encyclopaedia
Second, I'll share three overall goals of our current Wikipedia projects
Third, I'll raise the ambition level, to the level of defending Bildung, the liberal Western world order, and share our thought about the role of the database for AI
Fourth, Robert will share the practical experiences of our AI work so far

1. The ten prime insights are[edit]

use AI to improve the capacity of human wikipedians
do this by using AI as an infinitely eager junior research assistant
don't let the junior research assistant update Wikipedia directly, or we will have a situation similar to Lsjbot in Swedish and Cebuano (bulk data of questionable quality)
start from identifying great sources (high quality books, articles) and then perform two types of analysis on them – NER (Named Entity Recognition), meaning, identify which people, places and other Wikidata Q codes the source material is about, and RAG (Retrieval Augmented Generation), meaning, let a Large Language Model embed the source, vectorise it into an internal AI format
then compare the vectorised source data with existing Wikipedia articles
be smart in use of human resources where it matters most, such as by using the number of reads on Wikipedia articles
use best practices of Generative AI, in the form of prompt engineering which means tweaking the exact requests that are given to the large language model
do all of this with existing Wikipedia guidelines as the main priority, in particular NPOV, encyclopaedic language style and high-quality source references
In this, never forget the appropriate usage of conventional IT and manual Wikipedia updates – in our case, I've done Wikipedia edits in 40 languages, and Robert has done more than 50.000 Wikidata edits and 10.000 Wikipedia updates in Swedish alone
if you follow these guidelines, and add a human subject-matter expert, you can increase the human productivity by about 10x

2. Goals of our Wikipedia projects[edit]

The goals of our projects are to improve Wikipedia contents and quality in three somewhat related ways

one, by increasing the amount of high quality information (fill in holes)
two, by identifying and correcting low quality and missing NPOV (fix bugs)
three, by spreading quality information between languages; this is a much more elaborate and difficult process than mere translation, because different language versions have different cultures, practices and habits

3. Raising ambition levels[edit]

Now, we think it's time to raise the ambition level a lot. We think the learnings here have a lot of potential to greatly impact the overall quality of Wikipedia, and, in particular, improve the NPOV. In another Wikipedia project, Kateryna, we are focusing on removing Russian propaganda on Wikipedia, and the most urgent topic here is of course how Ukraine is represented. But Russian propaganda isn't limited to Ukraine, and propaganda isn't limited to Russia.

To accomplish this, we need to scale out the work, and no longer work with Google Spreadsheets but with a proper database. And here our overall logic ist that it makes sense for the source data, the AI vector data, and the output data all to be within the same database. For Wikipedia, that's MariaDB Server.

Now Robert will tell about the practical processes, batch tools, scripts, and experiences, how he came to this, using the example of a book called "Ett gott parti", about Albert Edelfelt, historically one of Finland's top two or three painters.

4. Practical experiences, Robert Silén[edit]

I'll now show you some practical examples on how we implemented Generative AI to improve Wikipedia articles.

To improve Wikipedia based on a single source with AI, two steps are necessary:

to identify what Wikipedia articles are relevant for the source
to generate the actual improvements and publish them.

For identification of Wikipedia articles, we used NER coupled to Wikidata. It identified in Ett Gott Parti over a thousand names of which about 300 people had existing Wikidata objects - and about the same amount of locations. The Wikidata objects are then linked to countless Wikipedia language versions and other Wikimedia resources that we can then easily access.

To prioritize which ones to work on, we could count the number of mentions of each person or location, or check how much they are read about on Wikipedia. Howevever, for a qualitative evaluation, we asked the author of the book to evaluate the list regarding who has the most potential. We chose 70 people.

In the screenshot you can see which paragraphs mention for example Adolf von Becker. We fed these paragraphs to GenAI together with the article about him. We gave GenAI an instruction-prompt to strictly only use provided content.

We asked GenAI to make three suggestions, each shorter to get a varying degree of focus and encyclopedia briefness.

The author of the book reviewed the suggestions to evaluate they were indeed correct. Out of 70 suggestions, we edited 33 articles in Swedish, and several more in other languages. Some we could copy paste into the article, others we had to adapt. We dropped roughly half of the suggestions because the book as such did not offer enough good content to the Wikipedia article.

In this particular case of improving the wikipedia article on Adolf von Becker, we did not use any of the exact suggestions as such, but instead used them content wise as inspiration, and phrased a more appropriate improvement to the article. to be fair to GenAI, we asked specifically for a suggestion for a new paragraph to the article - with another phrasing we might have gotten a better ready-to-use "embedded" suggestion, but which would have been more difficult to evaluate.

There is room for improvement in AI's results in this exercise. With better prompt engineering and chains of prompt, AI could itself identify weak suggestions, and adapt its suggestion better to the wikipedia article. Experimenting is however time consuming, as all results need to be manually double checked to evaluate improvements.

There is a lot of potential to continue improving this process and raise the efficiency and pleasure of Wikipedia editing with reliable sources.