Wikidata/Notes/Article generation

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

These are development notes about a potential extension to Wikidata that would allow for automated generation of articles on Wikipedias from Wikidata data. This is not a proposal to create more bot-uploaded articles. Rather, it would allow readers to search for and view the same information without the need for the articles to be actually created.

In this text, an article created in this way will be called a pseudo-article. To readers, pseudo-articles would appear exactly the same as all other articles. But they would not really be in the Wikipedias database, not appear when clicking on the "Random page" link, and not be in the article count.

Item Breza will be used as an example of a disambiguation page and Breza, Slovakia as an example of an article.

Item matching[edit]

Imagine that you have visited page http://simple.wikipedia.org/wiki/Breza or http://simple.wikipedia.org/wiki/Breza, Slovakia. Right now, you would simply receive a "Wikipedia does not yet have an article with this name." message.

The first step that the extension would have to do is to determine which Wikidata item(s) belong(s) to the article with the title "Breza" or "Breza, Slovakia". A simple way of doing this is to introduce a Wikidata property "Wikipedia canonical title", of type multilingual text. This property would contain the expected article title in various languages for the topic represented by the current item, for example "Breza, Slovakia" for the Simple English Wikipedia, but it might be "Breza (Slovačka)" for the Serbian Wikipedia and so on.

Templating[edit]

Once the article title is determined, and the appropriate Wikidata item is found, then the appropriate pseudo-article template needs to be found. This is simply a new Wikidata property "Wikipedia pseudo-article template", that would point to the Wikidata item about the template.

These templates would not be mere infoboxes, but would contain some article text as well. For example, an article template could contain:

'''{{#property:name}}''' is a [[populated place]] in [[Slovakia]] in the [[{{#property:is in the administrative unit}}]]. It has a population of {{#property:population}} people.

When the template is interpreted, the article would look like:

Breza is a populated place in Slovakia, in the Námestovo District. It has a population of 1510 people.

It would be possible to edit pseudo-articles and create real articles using the Creating pages with preloaded text mechanism. A possible problem with this is that the articles will contain the Wikidata {{#property}}ies interspersed in the article text.

Linking[edit]

If a link exists in a Wikipedia article that links to a non-existing page that nonetheless is a pseudo-article, it shouldn't be a red link but a blue link.

In order for this to be done, while an article is being parsed, whenever a link to a nonexisting article is encountered, the entire mechanism for item matching described above should be invoked. If an item is matched, then this isn't a red link but a blue link.

Search[edit]

With the above implemented, the pseudo-articles could be read if you visit the title, but could not be searched for.

A simple solution for this is that, in addition to Wikipedia search, Wikidata properties, labels and descriptions in the Wikipedias language would be searched for the search string. Items found would be displayed as links to their pseudo-articles.

This would means that it will not be possible to search for text created on the border between template text and Wikidata properties. For example, while the pseudo-article would contain the text "Breza is a populated place in Slovakia", it would appear in searches for "breza" but not in searches for "place" since this word appears only in the template and not in Wikidata.

See also[edit]