Abstract Wikipedia/Updates/2023-03-15

Abstract Wikipedia Updates

Quo vadis, Abstract Wikipedia?

Abstract Wikipedia will allow more people to contribute their voices to a baseline of knowledge, whilst working in an interface in their language. This shared baseline of knowledge will then be made available in many languages. This will be achieved by creating, storing, and maintaining the baseline of knowledge, per individual article, in a notation that is independent of natural language. We call content written in that notation "abstract content". This abstract content will then be turned into text in a specific natural language, with the help of the library of functions from Wikifunctions. Thus, Abstract Wikipedia will allow a speaker of any language to contribute content for readers of many different languages. This will allow more people to read more content in their language.

Example

[edit]

Assume that we want to create a new Wikipedia article for the planet Jupiter, and the first version of this article shall be the following (these are the first two sentences of the Simple English Wikipedia article):

Jupiter is the largest planet in the Solar System. It is the fifth planet from the Sun.

The abstract content representing this natural text could look like this:

Article(
 text: [
   Superlative(
     subject: Jupiter,
     quality: large,
     class: planet,
     location constraint: Solar System),
   Definition(
     subject: Jupiter,
     definition: Rank(
       rank: Positive integer(
         value: 5),
       object: planet,
       by: Relational noun(
         noun: distance,
         to: Sun)))],
 categories: [Jupiter, planet, Solar System])

Wikifunctions would have types for Article, Superlative, Definition, Rank, etc., which are used here as the abstract notation for the content of these two sentences. This abstract content is shown using labels in English here for our convenience, whereas in fact they would all be ZIDs (from Wikifunctions) and QIDs (from Wikidata). There will be software components to provide for the viewing, creation, and editing of abstract content. Wikifunctions will then also provide functions that take this object as an argument and generate natural language text such as the above.

One question that needs to be answered is where these objects would be stored, and how to associate the above object with the Wikidata item for Jupiter, Q319. We were originally planning to have this conversation and decision before the launch of Wikifunctions, but looking at the complexity of the system and the fact that it is very difficult to imagine given that so little of it is tangible so far, we decided not to open this question for discussion now, and instead to wait until after the launch of Wikifunctions, when we will all have a better understanding of how that part of the ecosystem works.

Below, we outline a few options that came up in the discussion between folks on the Abstract Wikipedia team and the Wikidata team at Wikimedia Deutschland. It also ties to some of the questions Lydia Pintscher and I were answering in an interview on the Wikimove podcast episode that was released today and that we invite you to listen to. Thanks to Nicole Ebber and Nikki Zeuner for the interview!

We are genuinely undecided about the best answer, and we would benefit from a wider discussion of the options, and potentially other options as well. Please also ask questions - these can often clarify and shine light on points that are muddy to us as well. We currently are focusing on the following five options:

A new tab on items in Wikidata
Create a new data type for objects on Wikidata
Objects on Wikifunctions
Objects on a new Wikipedia language edition
Unattached namespace on an existing project

Let’s discuss these five options below.

Option 1: A new tab on items in Wikidata

[edit]

We could add a tab, leading to a new namespace on Wikidata with a new content type, where the abstract content would live. This namespace would be attached to the item namespace in Wikidata. This way, every item would have a natural place to store its associated abstract content.

Given the little use of the item talk pages on Wikidata, it seems an additional talk page may not be valuable, so we might want to redirect people to the item's main talk page for any discussions.

One big question would be where to store content for Wikivoyage and other projects about the same items (e.g. the abstract content we might want to write about Q90/Paris from the perspective of Wikipedia would be different from that for Wikivoyage). Would that need yet another namespace? Adding a new associated namespace would be a technical challenge; adding several would be a complex task, and we hope that not to happen often.

Option 2: Create a new data type for Wikifunction objects

[edit]

We could create a new data type on Wikidata for Wikifunctions objects. Then the community could create a property on Wikidata that stores the abstract content on a given item as a literal. This would have the added flexibility that the community could add more abstract contents to an item for specific use cases, e.g. to represent content from Wikivoyage, or to represent the history of an item, the etymology of a word, etc.

We will need such a datatype and the UX for it anyway, given our planned support for abstract descriptions. In fact, this might be the simplest way to support abstract description.

However, the UX of Wikidata doesn't lend itself to this easily, and adapting to this model would be challenging given the constraints of an item page. Existing properties are edited as one or a small number of simple, short text boxes, often with auto-complete; abstract content would instead be a larger text area, with helpers and possibly a toolbar, a preview control, etc.. One option could in principle be a modal dialog for editing, but these come with their own inherent UX downsides and are usually more complex to implement than the same functionality in its own dedicated environment. Also, this would break the current design patterns of an item page, and may not be aligned with the patterns that might be planned for its future.

Further, while abstract content is somewhat structured and machine-readable, it is less (or differently) so than an Item, and its structure would probably not be queryable with SPARQL.

This option comes with two additional challenges: some Wikidata items, already nearing the maximum size, would grow still further, and we would need a solution to allow that, and second how to deal with the visualization, editing, and diffing of potentially very large and complex values.

Option 3: Objects on Wikifunctions

[edit]

Instead of having objects live in Wikidata as the values of statements or on an additional tab, Wikidata could merely store a pointer to an object on Wikifunctions. We still would create a new data type, but that data type would be just the ZID of an object stored in Wikifunctions.

This would solve all the challenges of the previous option, and retain many of its advantages.

It would have the additional advantage that several items could refer to the same object on Wikifunctions. Whereas this sounds rarely useful for the abstract content of Wikipedia articles, this might prove very useful for abstract descriptions of Wikidata items and the abstract glosses of lexemes. With the creation of abstract content, types, and natural language generation functions on the same platform, collaboration between people who focus on one of these areas would be more direct.

This option could be a challenge for the Wikifunctions community. The scope would expand to cover content as well as functions. This could make it difficult for the smaller community, and need more community patrollers like those already active on Wikidata.

Option 4: Objects on a new Wikipedia language edition

[edit]

We could launch a new Wikipedia language edition in which the main namespace is abstract content. This could be called e.g. the multi-lingual Wikipedia (mul.wikipedia.org) language edition, or the abstract Wikipedia language edition (abstract.wikipedia.org). Like all other language editions, the pages are connected to the items via sitelinks in Wikidata.

If Wikivoyage wanted to use the same approach, they would need to copy the setup and create a multi-lingual Wikivoyage edition (or, as Option 4B, perhaps these could be a different namespace on a single shared ‘abstract’ content wiki?).

This would give a very clear distinction of what is Wikipedia content and what is not, and give the abstract content a distinct visibility, which would otherwise be somewhat hidden between Wikifunctions and Wikidata. On the other hand, it would splinter the communities further, and mix in a "new" concept of a wiki that isn't really a Wikipedia but is labelled as such.

Option 5: Unattached namespace on existing project

[edit]

We could introduce a new namespace to an existing project where the abstract content for Wikipedia would effectively live. Here are the most likely options:

Wikidata (as a separate namespace, not attached to the items)
Commons
Meta
English Wikipedia (not attached to the articles)

Whereas technically all these options would be the same, they would be extremely different from a social and community perspective. We will refrain here from discussing these differences for now, unless this starts becoming a more likely option.

Rant: One Wikimedia movement, or many projects?

[edit]

Conway's law states that software mirrors organization, or, as it was put, if you put three teams to work on a compiler, you get a three-pass compiler. The opposite is also true, as previously observed by Guillaume Paumier: a software system determines the community structure that evolves around the system. Within the Wikimedia projects, we often see this effect in the dynamics between the different Wikipedia language communities, the Wikidata community, the Commons community, Meta, etc. The stories of local protection of media files on Commons, of wikis opting-out from global anti-abuse tools, or of short descriptions in English Wikipedia should be warning enough.

The question we ask today is not only hard because there are genuine technical challenges that we have to overcome, and we have to make a tradeoff. It is additionally so much harder because we can anticipate that there will be fracture lines between the different projects, and maybe even anticipate how these fracture lines would shape out. The whole story would be so much easier if we would, in general, regard ourselves as being part of one common movement, as one large community. But I doubt that we will see much progress on this trajectory before we have to resolve the above question.

Until then we only can remain mindful about the possible solutions, their effects, and that we should be careful to design for the world as it is and as it likely will be, and not for the world we wish to be.

Public NLG workstream on Tuesday

[edit]

On Tuesday, March 21, there will be the third public NLG workstream meeting on Jitsi. Feel free to reach out to me and suggest presentations beforehand if you want. We have a bit already planned, but there is still space. The meeting is 16:30-17:30 UTC, which is an hour off for US friends.