Jump to content

Abstract Wikipedia/Updates/2021-06-24

From Meta, a Wikimedia project coordination wiki
Abstract Wikipedia Updates Translate

Abstract Wikipedia via mailing list Abstract Wikipedia on IRC Wikifunctions on Telegram Wikifunctions on Mastodon Wikifunctions on Twitter Wikifunctions on Facebook Wikifunctions on YouTube Wikifunctions website Translate

Summary: The Grammatical Framework community is inviting Wikimedians to participate for free in the GF Summer School 2021. Participation for Wikimedians will be sponsored by Digital Grammars.

“Grammatical Framework” (GF) is an Open Source functional programming language and suite of tools which is aimed at multilingual natural language generation and parsing of natural language input. GF was first created in 1998 at Xerox Research in order to support multilingual document authoring. GF is capable of parsing and generating texts in several languages simultaneously while working from a language-independent representation of meaning. GF has an active and lively community, and offers more than 40 languages.

Here is an example of how GF works (note the syntax has been changed from a Haskell-like syntax to a functional syntax). Given an abstract representation such as:

mkUtt(mkS(mkCl(mkNP(aPl_Det, horse_N), mkNP(aPl_Det, animal_N))))

In order to make it a bit easier to understand, here's the terminology unabbreviated:

make Utterance (make Sentence (make Clause (make Noun Phrase (a Plural Determiner, horse Noun), make Noun Phrase (a Plural Determiner, animal Noun))))

Note that this structure in turn could also be abstracted away behind a function call with a simpler structure:

subsumes(horse_N, animal_N)

One can linearize that abstract representation in several languages. Here are the results as created by the cloud-based implementation of GF (which is dated as of 2012 - by now, GF has added support to dozens more languages):

  • Bulgarian: коне са животни
  • Chinese: 些 马 是 些 动 物
  • Dutch: paarden zijn dieren
  • English: horses are animals
  • Spanish: caballos son animales
  • Swedish: hästar är djur

Let’s make two small changes to the abstract representation: add a negative polarity to the sentence (negativePol) and switch horse_N with tree_N, and we get the following representation:

mkUtt(mkS(negativePol, mkCl(mkNP(aPl_Det, tree_N), mkNP(aPl_Det, animal_N))))

Just as above, this could be hidden behind a function call:

subsumesNot(tree_N, animal_N)

This leads to the following linearizations:

  • Bulgarian: дърва не са животни
  • Chinese: 些 树 不 是 些 动 物
  • Dutch: bomen zijn niet dieren
  • English: trees aren't animals
  • Spanish: árboles no son animales
  • Swedish: träd är inte djur

While the idea for Abstract Wikipedia was developed, GF served as an important inspiration. It was part of AceWiki, an extension of MediaWiki that was integrating tightly with GF and Attempto Controlled English (ACE) in order to create text in several languages and also to capture the formal semantics of the text. Whereas in AceWiki one of the main goals was to express all sentences also in a formal logical language (in that case OWL), we are less interested in the formal semantics of the abstract content (in fact, this is one major difference between Abstract Wikipedia and the many predecessor projects). Other than that you can see how GF and AceWiki have influenced the development of Abstract Wikipedia.

Since the announcement of Abstract Wikipedia, the GF developers and communities have reached out to the Abstract Wikipedia developers, and we have been discussing our plans and ideas. In order to further the relationship between the communities and to transfer experiences and ideas between them, we are very happy to extend an invitation to the Abstract Wikipedia community: this year’s Grammatical Framework Summer School will be open and free for all Wikimedians.

At this stage, it is too early to commit ourselves to using GF as the only approach towards natural language generation in Abstract Wikipedia. There are alternatives, and Wikifunctions will be malleable enough to support different approaches. One example for such an alternative is HPSG (Head-driven phrase structure grammar), which will be presented in the second week of the summer school. But we plan to learn from the decades of work and research into GF and the hundreds of person-years that went into its development, and we also plan to explore whether we can reuse some of the software or parts of the comprehensive grammar libraries that are part of GF. In order to facilitate such reuse, it will be crucial to have more knowledge about each other and better mutual understanding.

The GF Summer School 2021 will be held from 26 July to 6 August in Singapore, and it will be possible to attend online. Registration will be required. In order to register as a Wikimedian, please email inari(_AT_)digitalgrammars.com, state your Wikimedia account and your name, your country of residence, the languages you read and write, and whether you would like to participate for one or two weeks. This step is required in order to have you avoid the participation fee—if you sign up yourself, you will need to pay the fee. We are very thankful to Digital Grammars for covering the fee for Wikimedians.

We are very excited about this collaboration and are looking forward to the two communities working together and to mutually benefit from each other's goals, experiences, and skills.

This week also saw our first office hour. We answered a lot of questions, and you can catch up on the logs. We plan the next office hour to be in four to six weeks, and will announce dates also in this newsletter.