Diana Inkpen

Natural Language Processing for Social Media


Скачать книгу

TweeboParser

       3.1 An example of annotation with the true location [Inkpen et al., 2015]

       3.2 Classification accuracies for user location detection on the Eisenstein dataset [Liu and Inkpen, 2015]

       3.3 Mean error distance of predictions on the Eisenstein dataset [Liu and Inkpen, 2015]

       3.4 Results for user location prediction on the Roller dataset [Liu and Inkpen, 2015]

       3.5 Performance of the classifiers trained on different features for cities [Inkpen et al., 2015]

       3.6 Classification results for emotion classes and non-emotion by Ghazi et al. [2014]

       3.7 Accuracy of the mood classification by Keshtkar and Inkpen [2012]

       3.8 Statistics on hashtag use in the aligned bilingual corpus [Gotti et al., 2014]

       3.9 Distribution of hashtags in epilogues and prologues [Gotti et al., 2014]

       3.10 Percentage of unknown hashtags to English and French vocabularies of the Hansard corpus [Gotti et al., 2014]

       3.11 Percentage of unknown hashtags to “standard” English and French vocabularies, after automatic segmentation of multiword hashtags into simple words [Gotti et al., 2014]

       3.12 Translation performance obtained by Gotti et al. [2014]

       Preface

      This book presents the state-of-the-art in research and empirical studies in the field of Natural Language Processing (NLP) for the semantic analysis of social media data. Because the field is continuously growing, this third edition adds information about recently proposed methods and their results for the tasks and applications that we covered in the first and second editions.

      Over the past few years, online social networking sites have revolutionized the way we communicate with individuals, groups and communities, and altered everyday practices. The unprecedented volume and variety of user-generated content and the user interaction network constitute new opportunities for understanding social behavior and building socially intelligent systems.

      Much research work on social networks and the mining of the social web is based on graph theory. That is apt because a social structure is made up of a set of social actors and a set of the dyadic ties between these actors. We believe that the graph mining methods for structure, information diffusion or influence spread in social networks need to be combined with the content analysis of social media. This provides the opportunity for new applications that use the information publicly available as a result of social interactions. Adapted classic NLP methods can partially solve the problem of social media content analysis focusing on the posted messages. When we receive a text of less than 10 characters, including an emoticon and a heart, we understand it and even respond to it! It is impossible to use NLP methods to process this type of document, but there is a logical message in social media data based on which two people can communicate. The same logic dominates worldwide, and people from all over the world share and communicate with each other. There is a new and challenging language for NLP.

      We believe that we need new theories and algorithms for semantic analysis of social media data, as well as a new way of approaching the big data processing. By semantic analysis, in this book, we mean the linguistic processing of the social media messages enhanced with semantics, and possibly also combining this with the structure of the social networks. We actually use the term in a more general sense to refer to applications that do intelligent processing of social media texts and meta-data. Some applications could access very large amounts of data; therefore the algorithms need to be adapted to be able process data (big data) in an online fashion and without necessarily storing all the data.

      This motivated us to give three tutorials on Applications of Social Media Text Analysis at EMNLP 20151, on Natural Language Processing for Social Media at the 29th Canadian Conference on Artificial Intelligence (AI 2016)2, and on How Natural Language Processing Helps Uncover Social Media Insights at the 33rd International Florida Artificial Intelligence Research Society Conference (FLAIRS 2020). Also on this topic, we organized several workshops (Semantic Analysis in Social Networks (SASM 2012)3, Language Analysis in Social Media (LASM 20134, and LASM 20145) in conjunction with conferences organized by the Association for Computational Linguistics6 (ACL, EACL, and NAACL-HLT).

      Our goal was to reflect a wide range of research and results in the analysis of language with implications for fields such as NLP, computational linguistics, sociolinguistics and psycholinguistics. Our workshops invited original research on all topics related to the analysis of language in social media, including the following topics:

      • What do people talk about on social media?

      • How do they express themselves?

      • Why do they post on social media?

      • How do language and social network properties interact?

      • Natural language processing techniques for social media analysis.

      • Semantic Web / ontologies / domain models to aid in understanding social data.

      • Characterizing participants via linguistic analysis.

      • Language, social media and human behavior.

      There were several other workshops on similar topics, for example, the Making Sense of Microposts (#Microposts)7 workshop series in conjunction with the World Wide Web Conference 2012 to 2016. These workshops focused in particular on short informal texts that are published without much effort (such as tweets, Facebook shares, Instagram-like shares, Google+ messages). There has been another series of Workshops on Natural Language Processing for Social Media (SocialNLP) since 2013. For example, SocialNLP 2017 was in conjunction with EACL 20178 and IEEE BigData 20179, and SocialNLP 2020 had two editions, one in conjunction with TheWebConf 2020 and one in conjunction with ACL 202010.

      The intended audience of this book is researchers that are interested in developing tools and applications for automatic analysis social of media texts. We assume that the readers have basic knowledge in the area of natural language processing and machine learning. We hope that this book will help the readers better understand computational linguistics and social media analysis, in particular text mining techniques and NLP applications (such as summarization, localization detection, sentiment and emotion analysis, topic detection and machine translation) designed specifically for social media texts.

      Besides updating each section in this third edition, we added a new section on keyphrase generation from social media messages and one on neural machine translation in Chapter 3 and three new applications in Chapter 4: rumor detection, recommender systems for social media, and preventing sexual harassment. We discuss the new methods and their results. The number of research projects and publications that use social media data is constantly increasing. Finally, we added more than 50 new references to the approximately 400 references from the second edition.

      Anna Atefeh Farzindar and Diana Inkpen

      March 2020

      1https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/EMNLP-Tutorials/pdf/EMNLP-Tutorials06.pdf