Wikidata/Notes/Article generation

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

These are development notes about a potential extension to Wikidata that would allow for automated generation of articles on Wikipedias from Wikidata data. This is not a proposal to create more bot-uploaded articles. Rather, it would allow readers to search for and view the same information without the need for the articles to be actually created.

In this text, an article created in this way will be called a pseudo-article. To readers, pseudo-articles would appear exactly the same as all other articles. But they would not really be in the Wikipedias database, not appear when clicking on the "Random page" link, and not be in the article count.

Item Breza will be used as an example of a disambiguation page and Breza, Slovakia as an example of an article.

Item matching[edit]

Imagine that you have visited page or, Slovakia. Right now, you would simply receive a "Wikipedia does not yet have an article with this name." message.

The first step that the extension would have to do is to determine which Wikidata item(s) belong(s) to the article with the title "Breza" or "Breza, Slovakia". Two possible ways to do this are envisioned:

By link[edit]

One possibility would require Wikidata to have links to unwritten Wikipedia articles. For this, Wikidata badges would have to be implemented first, and a "nonexistant" article badge would have to exist.

The actual article title would be human-entered on Wikidata in accordance with the appropriate rules of the target Wikipedia. In this case, Q397267 would have "Breza" as the article title on simple Wikipedia and Q427243 would have "Breza, Slovakia" as the article title on simple Wikipedia. Visiting, Slovakia would match the item Q427243.

Articles with the "nonexistant" badge would not be displayed in interwiki lists on Wikipedias.

By label[edit]

Another possibility would be to search for Wikidata items by labels and aliases in the Wikipedia language (including language fallback).

In case a single item is found, its label is the article title. So, if you visit, Slovakia, the extension would find that only Q427243 has this alias in the Simple English language, and that is the end of it.

In case multiple Wikidata items with the same label are found, the extension would have to automatically create a disambiguation page. The page would contain links to all the pseudo-articles with the same label, and Wikidata descriptions as their descriptions.

Article titles would ultimately be generated by by the extension following local Wikipedias rules, or perhaps simply by adding a Wikidata ID after the label. So, a pseudo-article about Q427243 would have the title simple:Breza, Slovakia in the first case, and simple:Breza (Q427243) in the second case.


Once the article title is determined, and the appropriate Wikidata item is found, then the appropriate pseudo-article template needs to be found. This is actually very easy since it would not be a job for the extension but for the local Wikipedias master pseudo-article template.

This template would practically be a gigantic {{#switch}} that would select the pseudo-article template on the basis of Wikidata properties. In this case, on the basis of "instance of=populated place" and "country=Slovakia" the template would see that the article template should be [[Template:Populated place in Slovakia article]].

These templates would not be mere infoboxes, but would contain some article text as well. For example, an article template could contain:

'''{{#property:name}}''' is a [[populated place]] in [[Slovakia]] in the [[{{#property:is in the administrative unit}}]]. It has a population of {{#property:population}} people.

When the template is interpreted, the article would look like:

Breza is a populated place in Slovakia, in the Námestovo District. It has a population of 1510 people.

It would be possible to edit pseudo-articles and create real articles using the Creating pages with preloaded text mechanism. A possible problem with this is that the articles will contain the Wikidata {{#property}}ies interspersed in the article text.


If a link exists in a Wikipedia article that links to a non-existing page that nonetheless is a pseudo-article, it shouldn't be a red link but a blue link.

In order for this to be done, while an article is being parsed, whenever a link to a nonexisting article is encountered, the entire mechanism for item matching described above should be invoked. If an item is matched, then this isn't a red link but a blue link.


With the above implemented, the pseudo-articles could be read if you visit the title, but could not be searched for.

A simple solution for this is that, in addition to Wikipedia search, Wikidata properties, labels and descriptions in the Wikipedias language would be searched for the search string. Items found would be displayed as links to their pseudo-articles.

This would means that it will not be possible to search for text created on the border between template text and Wikidata properties. For example, while the pseudo-article would contain the text "Breza is a populated place in Slovakia", it would appear in searches for "breza" but not in searches for "place" since this word appears only in the template and not in Wikidata.

See also[edit]