text production from AMR graphs, Marc’Aurelio Ranzato for exposure bias and curriculum learning algorithm diagrams, Qingyu Zhou for selective encoding figures, Sam Wiseman for a corrected RotoWire example, Sebastian Gehrmann for the bottom-up summarization diagram, Tsung-Hsien Wen for an alternative coverage mechanism plot, Xingxing Zhang for reinforcement learning for sentence simplification, and Yannis Konstas for AMR-to-text and data-to-text examples. Huge thanks to Emiel Krahmer, Graeme Hirst, and our anonymous reviewer for reviewing our book and providing us with detailed and constructive feedback. We have attempted to address all the issues they raised. All the remaining typos and inadequacies are entirely our responsibility. Finally, we would like to thank Morgan & Claypool Publishers for working with us in producing this manuscript. A very special thanks goes to Michael Morgan and Christine Kiilerich for always encouraging us and keeping us on track.
Shashi Narayan and Claire Gardent
March 2020
CHAPTER 1
Introduction
In this chapter, we outline the differences between text production and text analysis, we introduce the main text-production tasks this book is concerned with (i.e., text production from data, from text, and from meaning representations) and we summarise the content of each chapter. We also indicate what is not covered and introduce some notational conventions.
1.1WHAT IS TEXT PRODUCTION?
While natural language understanding [NLU, Bates, 1995] aims to analyse text, text production, or natural language generation [NLG, Gatt and Krahmer, 2018, Reiter and Dale, 2000], focuses on generating texts. More specifically, NLG differs from NLU in two main ways (cf. Figure 1.1). First, unlike text analysis, which always takes text as input, text production has many possible input types, namely, text [e.g., Nenkova and McKeown, 2011], data [e.g., Wiseman et al., 2017], or meaning representations [e.g., Konstas et al., 2017]. Second, text production has various potential goals. For instance, the goal may be to summarise, verbalise, or simplify the input.
Correspondingly, text production has many applications depending on what the input (data, text, or meaning representations) and what the goal is (simplifying, verbalising, paraphrasing, etc.). When the input is text (text-to-text or T2T generation), text production can be used to summarise the input document [e.g., Nenkova and McKeown, 2011], simplify a sentence [e.g., Shardlow, 2014, Siddharthan, 2014] or respond to a dialogue turn [e.g., Mott et al., 2004]. When the input is data, NLG can further be used to verbalise the content of a knowledge [e.g., Power, 2009] or a database [e.g., Angeli et al., 2010], generate reports from numerical [e.g., Reiter et al., 2005] or KB data [e.g., Bontcheva and Wilks, 2004], or generate captions from images [e.g., Bernardi et al., 2016]. Finally, NLG has also been used to regenerate text from the meaning representations designed by linguists to represent the meaning of natural language [e.g., Song et al., 2017].
In what follows, we examine generation from meaning representations, data, and text in more detail.
1.1.1GENERATING TEXT FROM MEANING REPRESENTATIONS
There are two main motivations for generating text from meaning representations.
First, an algorithm that converts meaning representations into well-formed text is a necessary component of traditional pipeline NLG systems [Gatt and Krahmer, 2018, Reiter and Dale, 2000]. As we shall see in Chapter 2, such systems include several modules, one of them (known as the surface realisation module) being responsible for generating text from some abstract linguistic representation derived by the system. To improve reusability, surface realisation challenges have recently been organised in an effort to identify input meaning representations that could serve as a common standard for NLG systems, thereby fueling research on that topic.
Figure 1.1: Input contents and communicative goals for text production.
Second, meaning representations can be viewed as an interface between NLU and NLG. Consider translation, for instance. Instead of learning machine translation models, which directly translate surface strings into surface strings, an interesting scientific challenge would be to learn a model that does something more akin to what humans seem to do, i.e., first, understand the source text, and second, generate the target text from the conceptual representation issued from that understanding (indeed a recent paper by Konstas et al. [2017] mentions this as future work). A similar two-step process (first, deriving a meaning representation from the input text, and second, generating a text from that meaning representation) also seems natural for such tasks as text simplification or summarisation.
Although there are still relatively few approaches adopting a two-step interpret-and-generate process or reusing existing surface realisation algorithms, there is already a large trend of research in text production which focuses on generating text from meaning representations produced by a semantic parser [May and Priyadarshi, 2017, Mille et al., 2018] or a dialogue manager [Novikova et al., 2017b]. In the case of semantic parsing, the meaning representations capture the semantics of the input text and can be exploited as mentioned above to model a two-step process in applications such as simplification [Narayan and Gardent, 2014], summarisation [Liu et al., 2015] or translation [Song et al., 2019b]. In the case of dialogue, the input meaning representation (a dialogue move) is output by the dialogue manager in response to the user input and provides the input to the dialogue generator, the module in charge of generating the system response.
Figure 1.2: Input shallow dependency tree from the generation challenge surface realisation task for the sentence “The most troublesome report may be the August merchandise trade deficit due out tomorrow.”
While a wide range of meaning representations and syntactic structures have been proposed for natural language (e.g., first-order logic, description logic, hybrid logic, derivation rather than derived syntactic trees), three main types of meaning representations have recently gained traction as input to text generation: meaning representations derived from syntactic dependency trees (cf. Figure 1.2), meaning representations derived through semantic parsing (cf. Figure 1.3), and meaning representations used as input to the generation of a dialogue engine response (cf. Figure 1.4). All three inputs gave rise to shared tasks and international challenges.
The Surface Realisation shared task [Belz et al., 2012, Mille et al., 2018] focuses on generating sentences from linguistic representations derived from syntactic dependency trees and includes a deep and a shallow track. For the shallow track, the input is an unordered, lemmatised syntactic dependency tree and the main focus is on linearisation (deriving the correct word order from the input tree) and morphological inflection (deriving the inflection from a lemma and a set of morphosyntactic features). For the deep track, on the other hand, the input is a dependency tree where dependencies are semantic rather than syntactic and function words have been removed. While the 2011 shared task only provided data for English, the 2018 version is multilingual and includes training data for Arabic, Czech, Dutch, English, Finnish, French, Italian, Portuguese, Russian, and Spanish (shallow track), and English, French, and Spanish (deep track).
Figure 1.3: Example input from the SemEval AMR-to-Text Generation Task.
Figure 1.4: E2E dialogue move and text.
In the SemEval-2017 Task 9 Generation Subtask [May and Priyadarshi, 2017], the goal is to generate text from an Abstract Meaning Representation (AMR, cf. Figure