Talk:Abstract Wikipedia/ Fellows evaluation - Answer

Add topic
From Meta, a Wikimedia project coordination wiki

on the lack of sufficiently general NLG systems, GF, UD, LRLs[edit]

First, thank you for writing this response! It was certainly very worthwhile to read.  

There is, unfortunately, an inaccuracy to what you state I’d have claimed. Specifically, it concerns the “according to Keet, Grammatical Framework notably lacks the flexibility to handle certain aspects of Niger-Congo B languages’ morphology” in your reply to the “There are sufficiently general NLG systems which could cover all (written) human languages (for example: Grammatical Framework, templatic system)” claim in the AW/ Fellows evaluation. The intricacies are not trivial, and so this reply ended up longer than intended. (tl;dr: NCB morphology wrt UD, and GF at least wrt approach and resourced-ness.)

The statement in the evaluation indeed deserves criticism and correction and I certainly did so several times with several arguments; thank you for mentioning that. The arguments got conflated into a, in shorthand, ‘all against GF’ that your reply suggests, but that is not the case. It wasn’t Grammatical Framework and Niger-Congo B (NCB) morphology I mentioned, but the dependency grammar framework of Universal Dependencies (UD) and NCB morphology. This issue came up in the context of the template language specification that assumes a dependency grammar, such as, e.g., UD (phrasing toned down in the Nov’22 version cf. v1 of July’22), but UD isn’t as universal as their developers may have hoped for. That is mainly due to UD being word-based and thus not able to cope with agglutination and morphologically rich languages, and therefore first needing modifications and extensions for it to be of any use with such languages, which include NCB languages. A published example of extensions to UD is for the St. Lawrence Yupik language, of the Eskimo–Aleut or Inuit–Yupik–Unangan language family. I’m still struggling with and evaluating my first draft extensions for isiZulu to shoehorn it into something “UD-like". Besides UD, most extant realisers (‘renderers’) such as SimpleNLG, are also word-based, which fall short when sub-word units are productive, as they are for NCB languages. The template language we designed is fine-grained enough at the sub-word level, and, indeed, dependency relations may be introduced that aren’t in UD’s list, but there’s yet the dependency grammar and unification operators to devise for many languages, a possibly required flexibility that the dependencies won’t amount to a neat tree just yet or violate the direction of the asymmetric relation, and other things may pop up for other languages, so whether it suffices for all languages remains to be seen.  

For determining adequacy of GF & NCB morphology, perhaps Laurette Marais or Laurette Pretorius can give some insights on possible limitations from a linguistic viewpoint. They have been working for the past 4.5 years on a resource grammar for isiZulu after attending a 2-week summer school training on it (on github; afaik, not evaluated). After so much effort, it is still only a partial grammar, which brings me to a problem with GF. While in theory GF may work and work well for at least certain well-resourced languages, it requires a lot of upfront investment, reliance on knowing a language's grammar that we only have partially for low-resourced languages, some inclination of principles of programming (in short supply globally), and taking long even for what's known due to resource (time, money, human skills, tools to machine-learn the grammar from a corpus, up-to-date accessible grammar books) shortages. For instance, we recently figured out and documented the grammar for the specific template for the person’s age for the template language specification’s example, which may not work for any object’s any age (like a Taj Mahal that’s 374 years old). However, it is such more generic rules that need to be declared in GF. For these reasons, at least practically, such a top-down strategy that GF currently requires is not a realistic choice for any low-, and perhaps even for medium-, resourced language. A better strategy would be a bottom-up one, with a ‘rules as you go’ approach where grammar rules can be added in a piecemeal way. That won’t be perfect either, but then at least there’s something to show for with a minimal relevant fragment of the grammar that can be built up and evaluated gradually by a larger group of people with a more varied skillset. Keet10 (talk) 11:43, 23 December 2022 (UTC)Reply

Maria, thank you so much for the clarification! Would you like us to add a link in the Answer pointing here?
Also, to make it clear, even though some passages of the Answer might be read as "All against GF", this is really not our intent. In fact, we have, since the Evaluation and the Answer were written, intensified our collaboration with the GF team, and have worked on getting a Python Evaluator with bindings to GF to work. We are certainly continuing to explore tighter cooperation in that field. --DVrandecic (WMF) (talk) 01:22, 20 January 2023 (UTC)Reply