Shashi Narayan

Deep Learning Approaches to Text Production


Скачать книгу

1.3), a semantic representation which includes entity identification and typing, PropBank [Palmer et al., 2005] semantic roles, entity grounding via wikification, as well as treatments of modality and negation. The task and the training data are restricted to English.

      Finally, following on previous work targeting the generation of a system response from a meaning representation consisting of a speech act (e.g., instruct, query, recommend) and a set of attribute-value pairs [Mairesse and Young, 2014, Wen et al., 2015], the E2E challenge [Novikova et al., 2017b] targets the generation of restaurant descriptions from sets of attribute-value pairs.1

      Data is another common source of input for text production with two prominent data types, namely table and knowledge-base data.2 For instance, Angeli et al. [2010] show how generation can be applied to sportscasting and weather forecast data [Reiter et al., 2005]. Konstas and Lapata [2012a,b] generate text from flight booking data, Lebret et al. [2016] from Wikipedia, and Wiseman et al. [2017] from basketball games box- and line-score tables. In those cases, the input to generation are tables containing records with an arbitrary number of fields (cf. Figure 1.5).

      There has also been much work on generating text from knowledge bases. Bontcheva and Wilks [2004] generate patient records from clinical data encoded in the RDF format (Resource Description Framework). Power [2009] generates text from whole knowledge bases encoded in OWL or description logic. And, more recently, Perez-Beltrachini and Lapata [2018] have investigated the generation of sentences and short texts from RDF-encoded DBPedia data.

      Whereas in generation from AMRs and dependency-based meaning representations, there is often an almost exact semantic match between input and output, this is not the case in data-to-text generation or generation from dialogue moves. As illustrated by the examples shown in Figure 1.5, there is no direct match between input data and output text. Instead, words must be chosen to lexicalise the input KB symbols, function words must be added, ellipsis and coordination must be used to avoid repetitions, and sometimes, content selection must be carried out to ensure that the output text adequately resembles human produced text.

      The third main strand of research in NLG is text-to-text generation. While for meaning representations and data-to-Text generation the most usual communicative goal is to verbalise the input, text-to-text generation can be categorised into three main classes depending on whether the communicative goal is to summarise, simplify, or paraphrase.

      Text summarisation has various possible inputs and outputs. The input may consist of multiple documents in the same [Dang, 2006, Hachey, 2009, Harman and Over, 2004] or multiple languages [Filatova, 2009, Giannakopoulus et al., 2017, Hovy and Lin, 1997, Kabadjov et al., 2010, Lloret and Palomar, 2011]; a single document [Durrett et al., 2016]; or a single (complex) sentence [Chopra et al., 2016, Graff et al., 2003, Napoles et al., 2012]. The latter task is often also referred to as “Sentence Compression” [Cohn and Lapata, 2008, Filippova and Strube, 2008, Filippova et al., 2015, Knight and Marcu, 2000, Pitler, 2010, Toutanova et al., 2016] while a related task, “Sentence Fusion”, consists of combining two or more sentences with overlapping information content, preserving common information and deleting irrelevant details [Filippova, 2010, McKeown et al., 2010, Thadani and McKeown, 2013]. As for the output produced, research on summarisation has focused on generating either a short abstract [Durrett et al., 2016, Grusky et al., 2018, Sandhaus, 2008], a title [Chopra et al., 2016, Rush et al., 2015], or a set of headlines [Hermann et al., 2015].

Image

      Figure 1.5: Data-to-Text example input and output (source: Konstas and Lapata [2012a]).

      Text paraphrasing aims to rewrite a text while preserving its meaning [Bannard and Callison-Burch, 2005, Barzilay and McKeown, 2001, Dras, 1999, Mallinson et al., 2017, Wubben et al., 2010], while text simplification targets the production of a text that is easier to understand [Narayan and Gardent, 2014, 2016, Siddharthan et al., 2004, Woodsend and Lapata, 2011, Wubben et al., 2012, Xu et al., 2015b, Zhang and Lapata, 2017, Zhu et al., 2010].

      Both paraphrasing and text simplification have been shown to facilitate and/or improve the performance of natural language processing (NLP) tools. The ability to automatically generate paraphrases (alternative phrasings of the same content) has been demonstrated to be useful in several areas of NLP such as question answering, where they can be used to improve query expansion [Riezler et al., 2007]; semantic parsing, where they help bridge the gap between a grammar-generated sentence and its natural counterparts [Berant and Liang, 2014]; machine translation [Kauchak and Barzilay, 2006]; sentence compression [Napoles et al., 2011]; and sentence representation [Wieting et al., 2015], where they help provide additional training or evaluation data. From a linguistic standpoint, the automatic generation of paraphrases is an important task in its own right as it demonstrates the capacity of NLP techniques to simulate human behaviour.

      Because shorter sentences are generally better processed by NLP systems, text simplification can be used as a pre-processing step which facilitates and improves the performance of parsers [Chandrasekar and Srinivas, 1997, Jelínek, 2014, McDonald and Nivre, 2011, Tomita, 1985], semantic role labelers [Vickrey and Koller, 2008], and statistical machine translation (SMT) systems [Chandrasekar et al., 1996]. Simplification also has a wide range of potential societal applications as it could be of use for people with reading disabilities [Inui et al., 2003] such as aphasia patients [Carroll et al., 1999], low-literacy readers [Watanabe et al., 2009], language learners [Siddharthan, 2002], and children [De Belder and Moens, 2010].

      This book falls into three main parts.

      Part I sets up the stage and introduces the basic notions, motivations, and evolutions underlying text production. It consists of three chapters.

      Chapter 1 (this chapter) briefly situates text production with respect to text analysis. It describes the range of input covered by text production, i.e., meaning representations, data, and text. And it introduces the main applications of text-production models which will be the focus of this book, namely, automatic summarisation, paraphrasing, text simplification, and data verbalisation.

      Chapter 2 summarises preneural approaches to text production, focusing first on text production from data and meaning representations, and second, on text-to-text generation. The chapter describes the main assumptions made for these tasks by pre-neural approaches, setting up the stage for the following chapter.

      Chapter 3 shows how deep learning introduced a profound change of paradigm for text production, leading to models which rely on very different architectural assumptions than pre-neural approaches and to the use of the encoder-decoder model as a unifying framework for all text-production tasks. It then goes on to present a basic encoder-decoder architecture, the sequence-to-sequence model, and shows how this architecture provides a natural framework both for representing the input and for generating from these input representations.

      Part II summarises recent progress on neural approaches to text production, showing how the basic encoder-decoder framework described in Chapter 3 can be improved to better model the characteristics of text production.

      While neural language models demonstrate a strong ability to generate fluent, natural sounding text given a sufficient amount of training data, a closer look at the output of text-production systems reveals several issues regarding text quality which have repeatedly been observed across generation tasks. The output text may contain information not present in the input (weak semantic adequacy) or, conversely, fail to convey all information present in the input (lack