Michael J. Paul

Social Monitoring for Public Health


Скачать книгу

about this new website called “Twitter” and what interesting research we could do with millions of tweets. One of us wondered if anyone talked about health on the new social media platform. At the end of the conference, Mark sent an email to Michael:

      I did a quick search out of curiosity. I looked at half of 1% of our twitter collection…to pull out all tweets that have the word “sick”…You can see that there are lots of tweets where the author [writes] sick, but of course it’s not such a simple problem.

      That email led to Michael’s class project, which led to our first paper together at the International Conference on Weblogs and Social Media (ICWSM) [Paul and Dredze, 2011]. That paper contained many of the ideas we’d follow in subsequent years: structured topic models, topic analyses for social media, influenza surveillance, drug and tobacco use, health behaviors, mental health, and geo-locating tweets. Since then, it’s been a whirlwind of research, and we’ve each developed a deeper interest in public health.

      In some sense, the field of social monitoring for public health has followed a similar path. Initial work on using Google and Twitter for influenza surveillance progressed to surveillance for other infectious diseases, and quickly branched out to a wide range of public health topics. Before we knew it, an entire field of research had grown around us.

      As we read newly published papers in this area, we were amazed by the creativity and breadth of what was being achieved. We collected our observations and began to notice common themes and structures across research areas, structures that could organize the field and allow it to move forward. It made sense to write down what we observed, and before we realized it, we had a book.

      This book is a reflection of the past decade of research, a summary of how we reached this point and what has been done so far in this fast-paced field. We can’t predict the future, but by understanding what we have achieved so far, we hope to provide researchers with a foundation on which to build the future of this field.

      Michael J. Paul and Mark Dredze

      August 2017

       Acknowledgments

      This book would not have been possible without the help and inspiration from too many people to name: our colleagues at our universities, including our students; people we’ve met at conferences and workshops; people we’ve talked to in government, NGOs, and companies working in this field; and the hundreds of people we’ve learned from, through papers, talks, emails, and conversations over dinner. This has been an amazing education for us, and we continue to learn all the time.

      As for getting this book written, we thank Emre Kiciman and the anonymous reviewers for their incredibly thorough and insightful feedback, and Jimmy Lin for encouraging us to do this.

      Michael J. Paul and Mark Dredze

      August 2017

      CHAPTER 1

       A New Source of Big Data

      We can only see a short distance ahead, but we can see plenty there that needs to be done.

      Alan Turing

      Protecting Health, Saving Lives—Millions at a Time

      Mission of the Johns Hopkins Bloomberg School of Public Health

      You’ve likely seen a public health awareness campaign. Perhaps you’ve seen an advertisement from New York Health (the Department of Health and Mental Hygiene) on the subway warning about the dangers of synthetic drugs. Maybe you’ve seen a billboard in Baltimore warning that children with influenza should stay home from school. You may have seen a social media advertisement from Los Angeles’s “Break Up With Tobacco” campaign.

      These are just some of the advertisements you may come across as part of public health awareness campaigns. These programs promote breast cancer screenings, testing for HIV, counseling for depression. Public health awareness campaigns are organized efforts to promote awareness of a health issue through the use of advertising, news and social media. There are hundreds of public health awareness campaigns organized every year, from well-known topics like “World Immunization Week,” “World AIDS Day” or “The Great American Smokeout,” to lesser known ones like “Global Handwashing Day” or the “National Bone Health Campaign.” All share the same goal: increase awareness in the hopes of combating a public health problem. A simple question: do these campaigns work?

      For the moment, let’s consider another topic: vaccines. One of the great public health victories of the last century has been the development and dissemination of a wide range of vaccines. Thanks to vaccines, we’ve saved 5 million lives a year by eliminating smallpox. We’ve essentially eliminated many other diseases in the developed world, including diphtheria, whooping cough, measles and polio. In the United States, with the introduction of the first measles vaccine in 1962, the number of measles cases went from roughly half a million a year to only a handful by the end of the 20th century [Orenstein et al., 2004].

      Yet this great public health victory is slowly being eroded with an uptick in cases over the past 5 years, including 667 measles infections in 2014.1 The return of the measles can be attributed to the growing vaccine refusal movement, which advocates against childhood vaccination, including the MMR vaccine (measles, mumps, and rubella). While many of us have heard the arguments of this movement against vaccines, why are they so effective with a small but significant fraction of parents? What reasons for skipping childhood vaccines are most convincing to different types of parents? How can physicians best address the concerns of parents?

      One final topic. One of the leading causes of death in the United States is suicide. It’s a staggering figure, but over 40,000 Americans die by suicide each year.2 While our understanding of mental health disorders and factors that influence suicide has advanced tremendously, we remain especially poor at predicting who will follow through on a suicide attempt. We have been unable to identify unique predictors of suicide [Murphy, 1984]. Instead, we can identify a large at-risk population, a small percentage of which will actually attempt. Treating this group is generally effective for suicide prevention, but too many cases are missed since we cannot further focus our efforts. With such a large number of deaths each year, it is natural to ask: are there other unknown predictors of suicide we are missing?

      These are just a few of the numerous questions for which we need better answers. Given the importance of these public health topics, issues that effect millions of lives, why don’t we have an answer? Why can’t we do the research necessary to provide actionable information?

      Like all scientific pursuits, our ability to answer health questions depends on our access to relevant data. Without evidence from data, we can’t provide meaningful answers. What about “big data” research, the popular buzzword that encompasses all manner of new research efforts from physics to psychology, from linguistics to literature? Where might we find big data for public health?

      A patient visits a doctor, and the interaction is documented in a clinical record. This interaction happens over a billion times in the United States each year.3 Surely this is enough to qualify as big data! These clinical records taken together have the potential to answer many important questions in medicine. Among the many goals of the Affordable Care Act passed by the United States Congress in 2010 was to digitize these records by incentivizing physicians to switch to electronic health records (EHRs). While the primary goal of the initiative was to reduce costs, an additional goal was to create a vast digital resource for health research [Adler-Milstein et al., 2014]. In large part, this has worked—the number of physician offices using EHRs has grown from around 50% in 2010 [Hsiao et al., 2012] to nearly 87% in 2015.4 Millions of digital records for patients throughout the United States have created opportunities for secondary use of electronic medical records [Safran et al., 2007] that can help answer questions about adverse drug events or measure the quality of health care delivery.

      Yet even if we had full access to an EHR with a billion clinical visits each year, we may not be able to answer the questions for the three topics posed above. Increased awareness