Talk:Abstract Wikipedia/Related and previous work/Natural language generation

Propose a link

Latest comment: 3 years ago1 comment1 person in discussion

Please add interesting links to the main page.

If you're not sure whether we might be interested, please add the link here. Any additional comments would be welcome. Please use the language you are most comfortable with.

Very interesting links

[edit]

Book Review (Yue Zhang, 2020) of Deep Learning Approaches to Text Production (Narayan and Gardent, 2020) "Text Production" is NLG, so the book's topic is "neural NLG". If you thought you might be interested in the book, the review will help you make up your mind. A preview of the book is available here.
A Survey of Evaluation Metrics Used for NLG Systems (Ananya B. Sai, Akash Kumar Mohankumar, and Mitesh M. Khapra, 2020) "Over the past few years, many evaluation metrics have been proposed: some task agnostic and others task specific. In this survey ... we propose that ... the metrics can be categorised as context-free and context-dependent metrics. Within each of these categories there are trained and untrained metrics which rely on word based, character based or embedding based information to evaluate a hypothesis. This arrangement ... shows that there is still a need for developing task-specific context-dependent metrics as most of the current metrics are context-free. This is a major gap in existing works as in many NLG tasks context plays a very important role and ignoring it is not prudent." As well as surveying the field of metrics, this paper gives an accessible overview of the NLG field today (and abstraction with credit is permitted). NLG subfields referenced are:
- Machine Translation (MT)
- Abstractive Summarization (AS)
- Free-form Question Answering (QA)
- Question Generation (QG)
- Data to Text Generation (D2T)
- Dialogue Generation (DG)
- Image Captioning (IC)

Each of these is described in less than half a page. Effectively excluded from the survey are "spelling and grammar correction, automatic paraphrase generation, video captioning, simplification of complex texts, automatic code generation, humour generation, etc". There is a useful one-page table of example inputs and outputs of the different activities at page 6. ("Table 1. Examples inputs and generated outputs for various Natural Language Generation tasks.") Unfortunately, apostrophes appear to have become corrupted. In Section 3 (p.9), as well as describing how evaluations have been set up, different criteria are named and defined for the different types of system (MT, AS etc). Section 4 (p.13) provides the authors' taxonomy of automated evaluation metrics. A one-page chart is at page 15. (Sections 5 to 8 are for the statisticians; there were far too many sigmas for me to cope with. Oh, and a λ... but it's only a "weighting factor" --GrounderUK (talk) 23:02, 31 August 2020 (UTC)) Section 5 (p.16) considers context-free metrics, Section 6 (p.31) considers context-dependent metrics, Section 7 (p.37) considers studies critical of automated evaluation metrics (but with no statistical analysis) and Section 8 (p.39) considers studies evaluating evaluation metrics. Section 9 (p.43) makes substantive recommendations for future research and Section 10 is a fairly brief conclusion, some of which is quoted above (after the link). 179 papers are referenced.Reply

Toward Givenness Hierarchy Theoretic Natural Language Generation (Pal & Williams, 2020) is a five-pager on the generation of anaphora by robots by inferring the likely givenness hierarchy of the interlocutor. "we formulate cognitive status modeling as a Bayesian filtering problem..." [section 3] "Using this formalism, our goal is to recursively estimate, for a given object, the probability distribution over cognitive statuses for object o at time t..." [We would be less concerned with time, not being a robot.] "The GetStatus(O) function (Algorithm 1) takes an object O and returns its most likely cognitive status. If no CSF [cognitive status filter; the Bayesian given in the paper] exists for O ... “UID” is returned; otherwise the most probable cognitive status for O (as determined by the distribution maintained by O’s CSF) is returned."

Pal, P., Zhu, L., Golden-Lasher, A., Swaminathan, A., and Williams, T. (2020). Givenness Hierarchy Theoretic Cognitive Status Filtering. Proceedings of the Annual Conference on Cognitive Science.

Wikipedia and Wikidata related

[edit]

Kaffee, Lucie-Aimée; Vougiouklis, Pavlos; Simperl, Elena. "Using Natural Language Generation to Bootstrap Missing Wikipedia Articles: A Human-centric Perspective" (PDF). Semantic Web: Interoperability, Usability, Applicability ( Q15817015). Arabic and Esperanto; ML. This is the most immediately relevant paper I've come across. The study used a machine-learning approach, but in other respects it explores the Abstract Wikipedia context. It prototypes an extension of ArticlePlaceholder with an introductory sentence generated from Wikidata triples. This prototype is presented to a number of experienced Wikipedia editors and their reactions are explored. This includes whether and how they edit or replace the introductory sentence, as well as structured interviews after the "confrontation". "The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the [introductory] sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency." Of concern to the Abstract Wikipedia project is the fact that it gets no mention.
Language Models as Knowledge Bases? (Petroni, Rocktäschel, Lewis, Bakhtin, Wu, Miller and Riedel, 2019) ML; English.
Drawing Questions from Wikidata (Geng, 2016) "...ratings suggest that our application can generate questions that can compete with manually created ones. However, there are still many that are deemed irrelevant. Our results indicate that Wikidata still has a lot of incomplete and imprecise data. ...we experienced that an update in Wikidata’s data set does impact our quiz and makes it more precise. Furthermore, our quiz application can be used to detect incomplete Wikidata items in an entertaining manner." This builds on work "introducing Wikidata quiz" described in:

Drawing Questions from Wikidata (Bissig, 2015) "We construct a graph by querying multiple Wikidata items originating from any chosen topic. The structure of the resulting graph is used to generate relevant questions with answer options ... participants found good questions and they saw improvements in the algorithm over time. We found that the incomplete status of Wikidata negatively impacts the quality of the generated graph. We found limits in the types of questions that are suitable for generating from knowledge bases..."

Natural Language Interface System for Querying Wikidata (Jafari & Kumar, 2020) "addresses the gap between end-user and knowledge in this case Wikidata. ...developed a system for querying Wikidata for the first time in natural language using ontology. ...identifies the potential entity and its synonyms and do the query against the Wikidata knowledge base. By help of weighting function it will find the most probable answer..."
GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES (Liu et al., 2018) "We have shown that generating Wikipedia can be approached as a multi-document summarization problem... a new, decoder-only sequence transduction model for the abstractive stage ... significantly outperforms traditional encoder- decoder architectures on long sequences, allowing us to condition on many reference documents and to generate coherent and informative Wikipedia articles."
Contractions, fusions and "combinations", in the context of the English morpheme pair "we'll"

NLG in practice

[edit]

Haapanen, Lauri; Leppänen, Leo (2020-10-07). "Recycling a genre for news automation: The production of Valtteri the Election Bot". AILA Review (Q15749404) 33: 67–85. ISSN 1461-0213. doi:10.1075/aila.00030.haa. Retrieved 2020-11-09. Swedish, Finnish, English; data to news; pipeline. As well as having tri-lingual and deterministic text generation (aligned to the pipeline approach), the user experience in this case study allowed the vantage point of the news story to be selected, including geographic localization and party-political perspective. Sadly, readers rated the generated texts less highly than similar stories produced by human journalists; the "most frequent complaints were about language errors, obtrusive repetition, and “dry” language, and the most common words in the negative feedback were words like boring, confusing, monotone and incoherent. On the positive side, the computer-written stories were generally praised for being based on facts and for being clear and to-the-point."

Leppänen, Leo; Munezero, Myriam; Granroth-Wilding, Mark; Toivonen, Hannu (Sep 2017). "Data-Driven News Generation for Automated Journalism". Proceedings of the 10th International Conference on Natural Language Generation. Santiago de Compostela, Spain: Association for Computational Linguistics. pp. 188–197. doi:10.18653/v1/W17-3528. Retrieved 2020-11-09. The authors "explore the field and challenges associated with building a journalistic natural language generation system [...and...] present a set of requirements that should guide system design, including transparency, accuracy, modifiability and transferability." The outcome is "a data-driven architecture for automated journalism that is largely domain and language independent."

Leppänen, Leo; Munezero, Myriam; Sirén-Heikel, Stefanie; Granroth-Wilding, Mark; Toivonen, Hannu (2017). "Finding and expressing news from structured data" (PDF). Proceedings of the 21st International Academic Mindtrek Conference on - AcademicMindtrek '17. the 21st International Academic Mindtrek Conference. Tampere, Finland: ACM Press. pp. 174–183. ISBN 978-1-4503-5426-4. doi:10.1145/3131085.3131112. Retrieved 2020-11-09. "through automation of the news generation process, including the generation of textual news articles, a large amount of news can be expressed in digestible formats to audiences, at varying local levels, and in multiple languages. In addition, automation allows the audience to tailor or personalize the news they want to read."

Not obviously relevant links

[edit]

Metalingo – An archaeological curio from 2003

Comments without a link

[edit]

@???:

Mailing list and chat contributions

[edit]

https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

mail archives

July 2020

There have been some worth-while references cited in our various emails. The original email contributors are encouraged to include any relevant links on the main page. In the mean time, here are a few links to glean from...

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000001.html

[https://meta.m.wikimedia.org/wiki/Wikilambda

[https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000002.html

[https://www.mediawiki.org/wiki/Extension:WikiLambda

[http://pronoun.is/he

[http://pronoun.is/they/.../themself

[https://wikimediafoundation.org/

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000003.html

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000004.html

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/July_2020_announcement

[https://news.ycombinator.com/item?id=23714875

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000005.html

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000006.html

Racket - Rhombus brainstorming

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000007.html

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000008.html

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Plan

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Plan#Task_P1.12:_JavaScript-based_implementations

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Plan#Task_P1.15:_Lua-based_implementations

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Plan#Task_O6:_Python-based_implementations

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Plan#Task_O7:_Implementations_in_other_languages

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Plan#Task_O25:_Integrate_into_IDEs

[http://cohmetrix.com/

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000009.html

[https://arxiv.org/abs/2004.04733

[https://www.researchgate.net/profile/Hiroshi_Uchida2/publication/239328725_A_Gift_for_a_Millennium/links/54c6953e0cf22d626a34f224/A-Gift-for-a-Millennium.pdf

[https://pdfs.semanticscholar.org/b030/ea4662e393657b9a134c006ca5b08e8a23b3.pdf?_ga=2.109286021.1099995837.1593757540-1424212949.1593757540

[http://www.afcp-parole.org/doc/Archives_JEP/2002_XXIVe_JEP_Nancy/talnrecital/TALN/actes_taln/articles/TALN26.pdf

[https://arxiv.org/pdf/1902.08061.pdf

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.212.2058&rep=rep1&type=pdf

[https://www.cicling.org/2005/unl-book/Papers/003.pdf

UNL DECONVERTER FOR TAMIL (T.Dhanabalan, T.V.Geetha. 2003)

References

Ref.	Author(s)	Year	Title etc
			Dates in italics have been duplicated for sorting.
[1]	Munpyo Hong & Oliver Streiter (	1999):	Overcoming the Language Barriers in the Web: The UNL-Approach, 11. Jahrestagung der Gesellschaft für Linguistische Datenverarbeitung (GLDV’99), Frankfurt am Main, Deutschland.
[2]	Hiroshi Uchida, Meiying Zhu, Tarcisio Della Senta (	1999) :	A gift for a Millennium. The United Nations University.
[3]	UNL Center (	2000) :	Enconverter Specification, UNDL Foundation.
[4]	UNL Center (	2000) :	DeConverter Specification, UNDL Foundation.
[5]	Serrasset, G. and Boitet, C. (	2000).	On UNL as the Future “html of the linguistic content” & the Reuse of Existing NLP Components in UNL-related Applications with the Example of a UNL-French Deconverter. Proceedings of the 18 th International Conference on Computational Linguistics, pp. 76-771.
[6]	Bouguslavsky, I., Frid, N. and Iomdin, L. (	2000).	Creating a Universal Networking Module within an Advanced NLP System. Proceedings of the 18 International Conference on Computational Linguistics, pp. 83-89.
[7]	Arnold, D. et al. (	1994)	Machine translation: an introductory guide. Manchester/Oxford: NCC/Blackwell.
[8]	Shachi Dave, Jignashu Parikh and Pushpak Bhattacharyya,	2002	Interlingua Based English Hindi Machine Translation and Language Divergence, to appear in Journal of Machine Translation, vol 17, 2002.
[9]	Pushpak Bhattacharyya,	2002	Many Languages on the Net, PCQuest Magazine, September 2002.
[10]	P. Bhattacharyya,	2001	Knowledge Extraction into Universal Networking Language Expressions, in Universal Networking Language Workshop, Geneva, Switzerland, January, 2001.

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.78.979&rep=rep1&type=pdf

Abstract Meaning Representation for Sembanking (2013)

References

Author(s)	Year	Title etc
O. Abend and A. Rappoport.	2013.	UCCA: A semantics-based grammatical annotation scheme. In Proc. IWCS.
C. Baker, C. Fillmore, and J. Lowe.	1998.	The Berkeley FrameNet project. In Proc. COLING.
V. Basile, J. Bos, K. Evang, and N. Venhuizen.	2012a.	Developing a large semantically annotated corpus. In Proc. LREC.
V. Basile, J. Bos, K. Evang, and N. Venhuizen.	2012b.	A platform for collaborative semantic annotation. In Proc. EACL demonstrations.
A. Böhmová, J. Hajič, E. Hajičova ́, and B. Hladká.	2003.	The Prague dependency treebank. In Treebanks. Springer.
A. Butler and K. Yoshimoto.	2012.	Banking meaning representations from treebanks. Linguistic Issues in Language Technology, 7.
S. Cai and K. Knight.	2013.	Smatch: An accuracy metric for abstract meaning representations. In Proc. ACL.
D. Chiang, J. Andreas, D. Bauer, K. M. Hermann, B. Jones, and K. Knight.	2013.	Parsing graphs with hyperedge replacement grammars. In Proc. ACL.
D. Davidson.	1969.	The individuation of events. In N. Rescher, editor, Essays in Honor of Carl G. Hempel. D. Reidel, Dordrecht.
M. Dreyer and D. Marcu.	2012.	Hyter: Meaning-equivalent semantics for translation evaluation. In Proc. NAACL.
B. Jones, J. Andreas, D. Bauer, K. M. Hermann, and K. Knight.	2012.	Semantics-based machine translation with hyperedge replacement grammars. In Proc. COLING.
H. Kamp, J. Van Genabith, and U. Reyle.	2011.	Discourse representation theory. In Handbook of philosophical logic, pages 125–394. Springer.
P. Kingsbury and M. Palmer.	2002.	From TreeBank to PropBank. In Proc. LREC.
D. B. Lenat.	1995.	Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11).
R. Martins.	2012.	Le Petit Prince in UNL. In Proc. LREC.
C. M. I. M. Matthiessen and J. A. Bateman.	1991.	Text Generation and Systemic-Functional Linguistics. Pinter, London.
A. Meyers, R. Reeves, C. Macleod, R. Szekely, V. Zielinska, B. Young, and R. Grishman.	2004.	The NomBank project: An interim report. In HLT-NAACL 2004 workshop: Frontiers in corpus annotation.
M. Palmer, D. Gildea, and P. Kingsbury.	2005.	The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1).
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu.	2002.	Bleu: a method for automatic evaluation of machine translation. In ACL, Philadelphia, PA.
S. Pradhan, E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel.	2007.	Ontonotes: A unified relational semantic representation. International Journal of Semantic Computing (IJSC), 1(4).
D. Quernheim and K. Knight.	2012a.	DAGGER: A toolkit for automata on directed acyclic graphs. In Proc. FSMNLP.
D. Quernheim and K. Knight.	2012b.	Towards probabilistic acceptors and transducers for feature structures. In Proc. SSST Workshop.
K. Schuler.	2005.	VerbNet: A broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania.
S. Shieber, F. C. N. Pereira, L. Karttunen, and M. Kay.	1986.	Compilation of papers on unification-based grammar formalisms. Technical Report CSLI-86-48, Center for the Study of Language and Information, Stanford, California.
H. Uchida, M. Zhu, and T. Della Senta.	1996.	UNL: Universal Networking Language—an electronic language for communication, understanding and collaboration. Technical report, IAS UNU Tokyo.
H. Uchida, M. Zhu, and T. Della Senta.	1999.	A gift for a millennium. Technical report, IAS UNU Tokyo.
N. Venhuizen, V. Basile, K. Evang, and J. Bos.	2013.	Gamification for word sense labeling. In Proc. IWCS.
R. Weischedel, E. Hovy, M. Marcus, M. Palmer, R. Belvin, S. Pradhan, L. Ramshaw, and N. Xue.	2011.	OntoNotes: A large training corpus for enhanced processing. In J. Olive, C. Christianson, and J. McCary, editors, Handbook of Natural Language Processing and Machine Translation. Springer.
Y. W. Wong and R. J. Mooney.	2006.	Learning for semantic parsing with statistical machine translation. In Proc. HLT-NAACL.

Heloise — An Ariane-G5 compatible environment for developing expert MT systems online (Berment/Boitet, 2012)

Heloise — A reengineering of Ariane-G5 SLLPs for application to π-languages (Berment/Boitet, 2012) π-languages are "poorly-resourced" languages (langues peu dotées). The overlap between these two papers is significant and their references are identical.

References

Author(s)	Title etc	Year
Bachut D.,	Le projet EUROLANG : une nouvelle perspective pour les outils d’aide à la traduction, Actes de TALN 1994, journées du PRC-CHM, Université de Marseille, 7-8 avril	1994.
Bachut D., Verastegui N.,	Software tools for the environment of a computer aided translation system, COLING-1984, Stanford University, pages 330 à 333, 2-6 juillet	1984.
Berment V.,	Méthodes pour informatiser les langues et les groupes de langues peu dotés, Thèse de doctorat, Grenoble, 18 mai	2004.
Boitet C.,	Le point sur Ariane-78 début 1982 (DSE-1), vol. 1, partie 1, le logiciel, rapport de la convention ADI n° 81/423, avril	1982.
Boitet C., Guillaume P., Quézel-Ambrunaz M.,	A case study in software evolution: from Ariane- 78.4 to Ariane-85, Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Colgate University, Hamilton, New York, 14-16 août	1985.
Boitet C.,	Current machine translation systems developed with GETA’s methodology and software tools, conférence Translating and the Computer 8, 13-14 novembre	1986.
Boitet C.,	La TAO à Grenoble en 1990, 1980-90 : TAO du réviseur et TAO du traducteur, partie des supports de l’école d’été de Lannion organisée en 1990 par le LATL et le CNET,	1990.
Boitet C.,	A research perspective on how to democratize machine translation and translation aids aiming at high quality final output, MT Summit VII, Kent Ridge Digital Labs, Singapour, pages 125 à 133, 13-17 septembre	1999.
Boitet C.,	A roadmap for MT: four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand), International Conference on Universal Knowledge and Language (ICUKL 2002), Goa, 25-29 novembre	2002.
Boitet C.,	Les architectures linguistiques et computationnelles en traduction automatique sont indépendantes, TALN 2008, Avignon, 9-13 juin	2008.
Del Vigna C., Berment V., Boitet C.,	La notion d’occurrence de formes de forêt (orientée et ordonnée) dans le langage ROBRA pour la traduction automatique, Approches algébrique, logique et algorithmique, Journée thématique ATALA sur la traduction automatique, ENST Paris, 1er décembre	2007.
Guillaume P.,	Ariane-G5 : Les langages spécialisés TRACOMPL et EXPANS, document GÉTA, juin	1989.
Guilbaud J.-P.,	Ariane-G5 : Environnement de développement et d’exécution de systèmes (linguiciels) de traduction automatique, Journée du GDR I3 co-organisée avec l’ATALA, Paris, novembre	1999.
Nguyen H.-T.,	Des systèmes de TA homogènes aux systèmes de TAO hétérogènes, Thèse de doctorat, Grenoble, 18 décembre	2009.
Vauquois B.,	Aspects of mechanical translation in 1979, Conference for Japan IBM Scientific program, juillet	1979.
Vauquois B.,	Computer aided translation and the Arabic language, First Arab school on science and technology, Rabat, octobre	1983.

ONLINE TRANSLATION SERVICES FOR THE LAO LANGUAGE (Berment, 2005)

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000010.html

https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-August/000236.html]

Drawing Questions from Wikidata (Geng, 2016)

#wikipedia-abstract

[edit]

26 July 2020

Between the Brackets ep.60 Richard Knipel

Between the Brackets ep.60 Amir Aharoni

GraalEneyj (Lucas Werkmeister)

LinGO Grammar Matrix

The LinGO Grammar Matrix Customization System (2009)

Professor Emily M. Bender

21 July 2020

Essence of Random ;)

Wikispore

[edit]

Latest comment: 3 years ago1 comment1 person in discussion

@DVrandecic (WMF): I think all of this should be on an NLG Wikispore, but I can't decide whether it should be sponsored by the project or just happen to emerge.--GrounderUK (talk) 14:38, 26 July 2020 (UTC)Reply

Organize by language, support, other dimensions?

[edit]

Latest comment: 3 years ago2 comments2 people in discussion

I get the feeling that a lot of these projects focus on English NLG, but what we're probably interested in most are approaches that generalize as far as possible across many different languages. Can the listed projects be organized (perhaps in a table somewhere) by the language(s) they aim to support? Other dimensions might be the degree of activity/support, openness of project, any specific language domains they may focus on (eg. technical writing?) ArthurPSmith (talk) 20:09, 28 July 2020 (UTC)Reply

@ArthurPSmith: It's a fair bet. Please see my suggestion to Denny above. To put this in context, the 2018 survey by Gatt & Krahmer references 214 different papers from 2010 to 2017 and 134 more from 2000 to 2009 (plus maybe another 100). A more up-to-date selection might be constrained to reference this survey. There are only just over 100 of those on Google Scholar (or "about 264", in Google gigo). The book by Reiter and Dale (2000) has 2347 citations.--GrounderUK (talk) 23:27, 28 July 2020 (UTC)Reply