Jump to content

Abstract Wikipedia/Updates/2022-05-27

From Meta, a Wikimedia project coordination wiki
Abstract Wikipedia Updates Translate

Abstract Wikipedia via mailing list Abstract Wikipedia on IRC Wikifunctions on Telegram Wikifunctions on Mastodon Wikifunctions on Twitter Wikifunctions on Facebook Wikifunctions on YouTube Wikifunctions website Translate

A proposal for the natural language generation (NLG) architecture

Our Google.org fellow, Ariel Gutman, has recently authored a proposal of an architecture for the natural language generation (NLG) system of Abstract Wikipedia.

The proposed architecture is driven by 4 main tenets:

  1. Modularity: the system should be modular, in that various aspects of NLG (e.g. morphosyntactic and phonotactic rules) can be modified independently.
  2. Lexicality: the system should be able to both fetch lexical data (separate from code), and rely on productive language rules to generate such data on the fly (e.g. inflecting English plurals with an -s).
  3. Recursivity: due to the compositional and recursive nature of most languages, an effective NLG system would need to be recursive itself.
  4. Extensibility: the system should be receptive to extension both by linguistic experts and technical contributors, as well as by non-technical and non-expert contributors, working on different parts of the system.

These considerations lead to a proposal of a "pipeline" system, in which an input Constructor is being processed by different modules (corresponding to various aspects of natural language) until the final output text is rendered.

In this pipeline dark blue forms are elements which would be created by contributors to Wikifunctions (rectangles) or Wikidata (rounded rectangles), while the light blue elements represent function or data living within the Wikifunctions orchestrator.

A key aspect of the system are the "templatic renderers". Wikifunctions will provide a specialized templating language, developed in-house, which should enable even non-technical contributors to write renderers for their language. These renderers will be supported by lexical data from Wikidata and Universal Dependency-style grammatical relations, which would be defined within Wikifunctions by linguistically-interested contributors.

We will be glad to hear any feedback from you on the proposal's talkpage, in particular about the idea to develop an in-house templating system.

Further updates for last week[edit]

  • This week, the team held its first Deep Dive session. We presented our project OKRs (objectives and key results) and received feedback from leadership
  • The team spent time this week preparing for last weekend's Hackathon:
    • There was a presentation and Q&A about Wikifunctions
    • A few Phabricator backlog tasks were identified and tagged for Hackathon participants

Below is the brief weekly summary highlighting the status of each workstream:

  • Performance:
    • Made progress on Beta cluster setup: orchestrator and evaluator services now update automatically to the latest image
  • NLG:
    • Completed the initial draft of the NLG system architecture design document
  • Metadata:
    • Partially completed the front-end code to accommodate both forwards and backwards compatibility for the old & new metadata formats
  • Experience:
    • Made more progress for function view and editor implementations for mobile
    • Completed function-schemata migration to Benjamin arrays
    • Handed off designs for 'Text with fallback'