Talk:A proposal towards a multilingual Wikipedia

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 10 years ago by Denny in topic Ideas for editing

This is a totally excellent idea[edit]

I use Google translate all the time on various language wikipedia articles in languages I can't read, and it would be great to have this as a way to supplement that activity. Sometimes I can read the google translation, but usually I just need to guess my way through the translated text and more often than not, in those cases my assumptions turn out to be dead wrong. This type of solid translation would be great as an extra support to get the gist of those automatic translations. Jane023 (talk) 07:17, 7 August 2013 (UTC)Reply

Thank you :) --denny (talk) 14:26, 7 August 2013 (UTC)Reply

Feedback from a sv:wp perspecitve[edit]

Thanks for sharing your ideas. While I am not fully support your idea of implementation, I share your basic view of the need and think some of the concepts you introduce are very interesting.

At sv:wp we are running an intitive to get all data for the 300 Swedish communes and 5000 town put into Wikidata. It will take some months as we want to ensure all data are correct and complete, and that there are routines to automatically uppdate the data in Wikidata from the databases at our authories. The step after is the of course to support the creation/updates of the correspondingng articles on all versions and here I I see some of your proposed concepts as very useful. For example a generalized munlitlingual "template" that can also generate text, and to use some type of automatic translation of the basic terms used in this "template".

If there is an interest to evolve these ideas futher I am interested to participate.Anders Wennersten (talk) 08:14, 7 August 2013 (UTC)Reply

Thank you, Anders. I didn't hear yet about this initiative on svwp, and it is a great one! I am glad to hear about it. I actually think that the idea of a multilingual Wikipedia and Wikidata would probably tie in together closer, i.e. in reality the example might not be {{F12:Q64|Q5519|Q183}} but rather {{F12:Q5519|Q183}} - since the information that it is Berlin, Q64, could be taken from Wikidata, but I tried not to introduce too many concepts at once. But a clear interplay would be there.
Nevertheless, in order to allow for a more readable text, a 'storyline' is needed, and there the order of the text and some basic decision about the lexicalization (paragraph splitting, order of ideas, etc.) would be a real addition to what Wikidata already provides (although, obviously, inferior to what a proper Wikipedia article could provide).
Yes. This is not a ready-finished perfect idea, this is just a first draft that has not been discussed, so I expect there is plenty of potential for improvement, and I would very much enjoy input and cooperation to evolve it. --denny (talk) 12:38, 7 August 2013 (UTC)Reply

Brilliant idea[edit]

👍Like This is great, although we don't have any of the grammar functions we need yet (as noted). I've thought of a similar project before actually. Love this proposal. PiRSquared17 (talk) 12:44, 7 August 2013 (UTC)Reply

Thank you. Yes, hopefully this could be done partially through Wiktionary / Wikidata, the rest in templates, but who knows? --denny (talk) 14:27, 7 August 2013 (UTC)Reply

Redundant work?[edit]

In the example given, there's a call to a template to express the statement that Berlin is the capital of Germany. But doesn't Wikidata already store (or at least, have the capacity for storing) facts like these? It seems redundant to re-express this fact outside of Wikidata. In a sense, as futuristic as this proposal sounds, it seems not ambitious enough. If the capacity exists to generate text in any language, why not just have automatically-generated articles? Yaron K. (talk) 13:16, 7 August 2013 (UTC)Reply

That is true. In that case I sacrificed this possible optimization in favor of not introducing too many concepts at once. As noted above, that sentence could avoid giving the actual capital.
I think generating the text from Wikidata alone will not take account for flow of the article, and argumentation structure, and that is something that can be added by a contributor instead of figured out by an algorithm. On the other hand, if someone comes up with an algorithm to create viable article just from the Wikidata data - yay! Even better. It's just, I wouldn't know how to do this in a project with a very limited budget on time and money, as I was trying to assume here. --denny (talk) 14:31, 7 August 2013 (UTC)Reply
It's true that having the page structure encoded in the wiki is a lot simpler conceptually than trying to encode it within the software itself. But, to expand on your point above, about the "subject" of the data triple not being necessary, it would appear that the "object" is not necessary either - and thus, that you should be able to embed a fact by just specifying a property ID. Yaron K. (talk) 15:31, 7 August 2013 (UTC)Reply
Yes, sure, could be done like this. In general, this is an example, and the community would decide on the exact way the frames would be done and which parameters they take and which would be optional. --denny (talk) 02:14, 8 August 2013 (UTC)Reply

An example of the possible benefit of a mulilingual template[edit]

As I have interpreted the idea, it should be possible to use a version independant template that will generate for any language: an infobox, article text and categories for an article using data from wikidata but text betwen the dataobjects from elsewhere (wiktionary, in the template, from a very copact machinetranslation tool). Seing this from a static viewpoitn this is corresponding to a botgeneration from Wikidata, but there is several stronger benefits. This general template can reuse translations from one type of obejcts (say towns in Malaysia to town from Mali etc). Even stronger would be the possibility to extend the template without the need to modify the different articles. Say for a town it could start with X is a town i y community with 777 inhabitants. it could then later extend to include a table of number of inhabitants at different times and some more data of the town, like if a new adm level of a District will be introduced by the authotirtes. And then both the text part, data part and infobox will be extended at the same time in all versions. And to continue the template could strat to introduce when the city was founded important builings ,if it is close to a lake etc, and thne the textpart in the articles will be very much extended. So wikidata as it is, but then also a new enitity that include these type of version independant templates (or these seen as a special type of data in Wikidata).Anders Wennersten (talk) 18:36, 7 August 2013 (UTC)Reply

Yes, such a workflow would be possible given the software extension. I am sure the community will come up with even more ingenious usages of it. I would prefer a less uniform approach, but that will be figured out during usage. --denny (talk) 04:26, 8 August 2013 (UTC)Reply

Related Work[edit]

Very interesting proposal. We have been working on a similar approach demonstrated by our prototype system called AceWiki-GF. It builds upon the controlled natural language ACE and the grammar framework GF. See this paper for the details: Tokuhn (talk) 19:09, 7 August 2013 (UTC)Reply

Thanks for adding the link! Yes, indeed, looks very interesting and relevant. Thanks, Tokuhn. --denny (talk) 04:17, 8 August 2013 (UTC)Reply


Most excellent idea. I wonder if a side effect would be to make smaller massively multilingual wikis/projects feasible? Few entities that run wikis in the world have the capacity to maintain hundreds of independent wikis, or even more than one) and if they deal with multiple languages at all, have a smattering of pages translated, and no language-specific search.

I wonder if "N5: Caching. Since the content of a page will depend on the language settings of the user, an appropriate caching mechanism needs to be designed and deployed" is really necessary? Content negotiation could just direct user from multilingualwiki to xx.multilingulawiki, users could then navigate to other languages in ways they expect, it'd be external crawl/search friendly, and no special caching mechanism would be needed. But this is a triviality compared to the rest of the proposal. Mike Linksvayer (talk) 02:30, 9 August 2013 (UTC)Reply

Thank! Yes, I think that it should be very feasible to make smaller wikis, and maybe even other content, massively multilingual. The frames would be either reusable or provided as services... hopefully :)
Regarding the caching: That is a possibility, and I will add it to the proposal. I am not knowledgeable enough to really discuss caching, but from the Wikidata-experience I learned that this part is trickier than I think, that is why I made it an explicit goal :) --denny (talk) 07:28, 9 August 2013 (UTC)Reply

How is the multilingual -pedia not Wikidata itself? simple-wiki[edit]

It's a great idea, but I'm confused

That wiki would consist of content, i.e. the article pages, possibly just a simple series of template calls, and frames, i.e. the templates that lexicalize the parameters of a given template call into a sentence...

1. Why isn't this just visiting an item in and displaying in a new ?mode=framedSentences ? Wikidata is already multilingual — shows facts about Berlin in French — and Wikidata seems the logical place to house the framing templates. (By the way, how many people realize you can get facts about items in your native language from Wikidata? It's great!)

Is the "added value" of this -pedia over Wikidata someone choosing the order of the framing templates so the most important sentences appear first? I guess so — e.g. doesn't really give you the essence of Napoleon Bonaparte ☺ — but it seems that should be additional meta-information about an item. If it's in a separate wiki, you cede the ordering of facts to the first bot that walks through Wikidata generating framing templates.

I think O4 "For each language, allow contributors to override a lexicalization or add a simple textual sentence." is a huge can of worms. For a small language wiki it's a diversion of effort from its current version. I thought wikis are going to evolve to get facts from Wikidata, so what's the difference between per-language sentences in this new wiki and articles in existing wikis?

2. How about instead revamp simple English Wikipedia to use the features from your proposal so that it mostly uses these frames, and then work with the open machine translation community (OMTc® — is there one?) to ensure any remaining text cleanly translates well into other languages.

-- S Page (WMF) (talk) 01:14, 14 August 2013 (UTC)Reply

:ad 2. above: We're currently decades prior to the point where we could guarantee that it cleanly translates even to the most used few languages of the world, leave alone the 280 or so, we are currently supporting. --Purodha Blissenbach (talk) 23:05, 14 August 2013 (UTC)Reply

Ideas for editing[edit]

I think that Wikidata IDs have many great uses (many not explored yet). In this proposal they can work great as an internal representation for information, but I was wondering if it would be possible to provide users with UIs that make the experience of creating content closer to their language.

For example, an English user could type "Berlin is the capital of Germany" getting some aids:

  • An autocompletion-like menu show as the user types "capital" with options to disambiguate it (Capital city, German magazine, Part of a column...).
  • "Berlin" appearing underlined in red to indicate it has not been disambiguated so that the user can click on it and select the appropriate concept as above as if it was spellchecking.
  • A verbose version of the text is shown below the user input to ensure the intended meaning. This text can be based on Wikidata descriptions: "Berlin (capital city and state of Germany) is a Capital (primary city of a political entity) of Germany (federal republic in central Europe)"
  • For users speaking multiple-languages, the translated versions can be shown immediately to detect some problems.

These aids are just some ideas where more than one (but not necessarily all of them) can be applied. Pginer (talk) 10:57, 4 September 2013 (UTC)Reply

I completely agree with you. In a research project we had developed a very interesting tool: it was a Word plugin, and it was analyzing the text you were entering while you were entering it, and then querying a Semantic MediaWiki to offer you semantic autocomplete and semantic validation. So if you wrote "Paris is the capital of " it would suggest "France", and if you wrote "Spain" it would put some colored wriggles underneath, etc. That was pretty cool.
And yes, I think the ideas you sketched are all very good and I would love to see them explored. --denny (talk) 12:59, 6 September 2013 (UTC)Reply