ISBN: 9781627058681 paperback
ISBN: 9781627058698 ebook
DOI 10.2200/S00700ED1V01Y201602HLT032
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON HUMAN LANGUAGE TECHNOLOGIES
Lecture #32
Series Editor: Graeme Hirst, University of Toronto
Series ISSN
Print 1947-4040 Electronic 1947-4059
Automatic Text Simplification
Horacio Saggion
Department of Information and Communication Technologies Universitat Pompeu Fabra
SYNTHESIS LECTURES ON HUMAN LANGUAGE TECHNOLOGIES #32
ABSTRACT
Thanks to the availability of texts on the Web in recent years, increased knowledge and information have been made available to broader audiences. However, the way in which a text is written—its vocabulary, its syntax—can be difficult to read and understand for many people, especially those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Texts containing uncommon words or long and complicated sentences can be difficult to read and understand by people as well as difficult to analyze by machines. Automatic text simplification is the process of transforming a text into another text which, ideally conveying the same message, will be easier to read and understand by a broader audience. The process usually involves the replacement of difficult or unknown phrases with simpler equivalents and the transformation of long and syntactically complex sentences into shorter and less complex ones. Automatic text simplification, a research topic which started 20 years ago, now has taken on a central role in natural language processing research not only because of the interesting challenges it posesses but also because of its social implications. This book presents past and current research in text simplification, exploring key issues including automatic readability assessment, lexical simplification, and syntactic simplification. It also provides a detailed account of machine learning techniques currently used in simplification, describes full systems designed for specific languages and target audiences, and offers available resources for research and development together with text simplification evaluation techniques.
KEYWORDS
syntactic simplification, lexical simplification, readability measures, text simplification systems, text simplification evaluation, text simplification resources
To Sandra, Jonas, Noah, and Isabella
Contents
1.3 The Need for Text Simplification
1.4 Easy-to-read Material on the Web
2 Readability and Text Simplification
2.3 Advanced Natural Language Processing for Readability Assessment
2.3.2 Readability as Classification
2.3.3 Discourse, Semantics, and Cohesion in Assessing Readability
2.5 Are Classic Readability Formulas Correlated?
2.6 Sentence-level Readability Assessment
3.1 A First Approach
3.2 Lexical Simplification in LexSiS
3.3 Assessing Word Difficulty
3.4 Using Comparable Corpora
3.4.1 Using Simple English Wikipedia Edit History
3.4.2 Using Wikipedia and Simple Wikipedia
3.5 Language Modeling for Lexical Simplification
3.6 Lexical Simplification Challenge
3.7 Simplifying Numerical Expressions in Text
3.8 Conclusion
3.9 Further Reading
4.1 First Steps in Syntactic Simplification
4.2 Syntactic Simplification and Cohesion
4.3 Rule-based Syntactic Simplification using Syntactic Dependencies
4.4 Pattern Matching over Dependencies with JAPE
4.5 Simplifying Complex Sentences by Extracting Key Events
4.6 Conclusion
4.7 Further Reading
5.1 Simplification as Translation
5.1.1 Learning Simple English
5.1.2 Facing Strong Simplifications