Абстрактна Вікіпедія/Оновлення/2021-07-29

Оновлення Абстрактної Вікіпедії

Абстрактні описи.

Мета Абстрактної Вікіпедії — дозволити кожному писати вміст будь-якою мовою, який можна читати будь-якою мовою. Зрештою, головною формою вмісту, якого ми прагнемо, є статті Вікіпедії, щоб дозволити кожному однаково володіти неупередженими, сучасними, вичерпними енциклопедичними знаннями та робити внесок у них.

У найближчі місяці ми зробимо основні етапи для досягнення цієї мети. Сьогодні я хочу показати один можливий етап на нашому шляху: абстрактні описи для Вікіданих.

Кожен Елемент у Вікіданих має назву (label), короткий опис і псевдоніми кожною мовою. Скажімо, ви подивилися на Елемент Q836805. Англійською мовою цей елемент має назву “Chalmers University of Technology” та опис “university in Gothenburg, Sweden”. Шведською це “Chalmers tekniska högskola” і “universitet i Göteborg, Sverige”. Мета назви – бути загальною назвою Елемента, а разом з описом вона має однозначно ідентифікувати предмет у світі. Тому, хоча кілька Елементів можуть мати однакову назву, бо речі у світі можуть називатися однаково, але відрізнятися, жодні два предмети не повинні мати однакову назву та однаковий опис певною мовою. Псевдоніми використовуються, щоб допомогти покращити пошук.

Смисл описів різними мовами часто однаковий, а коли це не так, він зазвичай відрізняється випадково, хоча іноді навмисно. Враховуючи, що у Вікіданих понад 94 мільйони Елементів, а Вікідані підтримують понад 430 мов, це означало б, що якби у нас було ідеальне покриття, у нас було б понад 40 мільярдів назв і стільки ж описів. І не тільки створення всіх цих назв і описів буде величезною роботою, вони також потребують підтримки. Якщо немає достатньої кількості учасників, які перевіряють їхню якість, на жаль, було б легко проникнути вандалізму.

The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as AutoDesc by Magnus Manske and bots such as Edoderoobot, Mr.Ibrahembot, MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have labels for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for descriptions! Today, we have well over two billion descriptions in Wikidata.

These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.

Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.

We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:

It must be possible to overwrite a description for a given language
It must be possible to retract a local overwrite for a given language
The pair of label and description still must remain unique
It would be great if implementing this would not be a large effort
The goal is not to create automatic descriptions, but abstract descriptions

The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English description for Douglas Adams as follows: “British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”. The Q42 Item's current manual English description is the much more succinct “English writer and humorist”. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.

Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.

Here are a few ideas to kick off the conversation:

We introduce a new language code, qqz. That code is in the range reserved for local use, and is similar to the other dummy language codes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions.
The content of the qqz description is an abstract content. Technically we could store it in some string notation such as “Z12367(Q3918, Q25287, Q34)”. Or we could store the JSON ZObject.
The abstract description would be edited using the same Vue components we develop for Wikifunctions for editing abstract content.
The abstract description is a fallback for languages without a description. It can be overwritten by providing a description in that language.
Every time the renderer function or the underlying lexicographic data changes, we also need to retrigger the relevant generations.
One question is whether we should store the generated description in the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description.
We also need to figure out how to report changes to everyone who is interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system.

All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.

What would that look like?

Let’s take a look at an oversimplified example. The English description for Chalmers is “university in Gothenburg, Sweden”. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367(Q3918, Q25287, Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English “(institution) in (city), (country)”, Q3918 the QID for university, Q25287 the QID for Gothenburg, and Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)

Renderers would then take this abstract content and for each language generate the description, in this case “university in Gothenburg, Sweden” for English, or “sveučilište u Göteborgu u Švedskoj” in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.

We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the talk page.

Status showing how testers and implementations work together

In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass.

The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.

The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.

Next week, we are skipping the weekly update.