being reported by the media in an effort to prove that this problem is newsworthy? Or does the figure come from officials, bureaucrats who routinely keep track of some social phenomenon, and who may not have much stake in what the numbers show?
2. Why was this statistic created? The identities of the people who create statistics are often clues to their motives. In general, activists seek to promote their causes, to draw attention to social problems. Therefore, we can suspect that they will favor large numbers, be more likely to produce them and less likely to view them critically. When reformers cry out that there are many prostitutes or homeless individuals, we need to recognize that their cause might seem less compelling if their numbers were smaller. On the other hand, note that other people may favor lower numbers. Remember that New York police officials produced figures showing that there were very few prostitutes in the city as evidence they were doing a good job. We need to be aware that the people who produce statistics often care what the numbers show, they use numbers as tools of persuasion.
3. How was this statistic created? We should not discount a statistic simply because its creators have a point of view, because they view a social problem as more or less serious. Rather, we need to ask how they arrived at the statistic. All statistics are imperfect, but some are far less perfect than others. There is a big difference between a number produced by a wild guess, and one generated through carefully designed research. This is the key question. Once we understand that all social statistics are created by someone, and that everyone who creates social statistics wants to prove something (even if that is only that they are careful, reliable, and unbiased), it becomes clear that the methods of creating statistics are key. The remainder of this book focuses on this third question.
PLAN OF THE BOOK
The following chapters discuss some of the most common and important problems with the creation and interpretation of social statistics. Chapter 2 examines four basic sources of bad statistics: bad guesses, deceptive definitions, confusing questions, and biased samples. Chapter 3 looks at mutant statistics, at ways even good statistics can be mangled, misused, and misunderstood. Chapter 4 discusses the logic of statistical comparison and explores some of the most common errors in comparing two or more time periods, places, groups, or social problems. Chapter 5 considers debates over statistics. Finally, chapter 6 examines three general approaches to thinking about statistics.
_________
*I am not implying that there is anything wrong with calling attention to social problems. In fact, this book can be seen as my effort to construct “bad statistics” as a problem that ought to concern people.
2
SOFT FACTS
Sources of Bad Statistics
A child advocate tells Congress that 3,000 children per year are lured with Internet messages and then kidnapped. Tobacco opponents attribute over 400,000 deaths per year to smoking. Antihunger activists say that 31 million Americans regularly “face hunger.” Although the press tends to present such statistics as facts, someone, somehow, had to produce these numbers. But how? Is there some law enforcement agency that keeps track of which kidnappings begin with online seductions? Are there medical authorities who decide which lung cancer deaths are caused by smoking, and which have other causes, such as breathing polluted air? Who counts Americans facing hunger—and what does “facing hunger” mean, anyway?
Chapter 1 argued that people produce statistics. Of course they do. All human knowledge—including statistics—is created through people’s actions; everything we know is shaped by our language, culture, and society. Sociologists call this the social construction of knowledge. Saying that knowledge is socially constructed does not mean that all we know is somehow fanciful, arbitrary, flawed, or wrong. For example, scientific knowledge can be remarkably accurate, so accurate that we may forget the people and social processes that produced it. I’m writing this chapter on a computer that represents the accumulation of centuries of scientific knowledge. Designing and building this computer required that people come to understand principles of physics, chemistry, electrical engineering, computer science—who knows what else? The development of that knowledge was a social process, yet the fact that the computer works reliably reflects the great confidence we have in the knowledge that went into building it.
This is one way to think about facts. Knowledge is factual when evidence supports it and we have great confidence in its accuracy. What we call “hard fact” is information supported by strong, convincing evidence; this means evidence that, so far as we know, we cannot deny, however we examine or test it. Facts always can be questioned, but they hold up under questioning. How did people come by this information? How did they interpret it? Are other interpretations possible? The more satisfactory the answers to such questions, the “harder” the facts.
Our knowledge about society tends to be “softer” than our knowledge of the physical world. Physicists have far more confidence in their measurements of the atomic weight of mercury than sociologists have in their descriptions of public attitudes toward abortion. This is because there are well-established, generally agreed-upon procedures for measuring atomic weights and because such measurements consistently produce the same results. In contrast, there is less agreement among social scientists about how best to measure—or even how to define—public opinion.
Although we sometimes treat social statistics as straightforward, hard facts, we ought to ask how those numbers are created. Remember that people promoting social problems want to persuade others, and they use statistics to make their claims more persuasive. Often, the ways people produce statistics are flawed: their numbers may be little more than guesses; or the figures may be a product of poor definitions, flawed measurements, or weak sampling. These are the four basic ways to create bad social statistics.
GUESSING
Activists hoping to draw attention to a new social problemoften find that there are no good statistics available.*When a troublesome social condition has been ignored, there usually are no accurate records about the condition to serve as the basis for good statistics. Therefore, when reporters ask activists for facts and figures (“Exactly how big is this problem?”), the activists cannot produce official, authoritative numbers.
What activists do have is their own sense that the problem is widespread and getting worse. After all, they believe it is an important problem, and they spend much of their time learning more about it and talking to other people who share their concerns. A hothouse atmosphere develops in which everyone agrees this is a big, important problem. People tell one another stories about the problem and, if no one has been keeping careful records, activists soon realize that many cases of the problem—maybe the vast majority—go unreported and leave no records.
Criminologists use the expression “the dark figure” to refer to the proportion of crimes that don’t appear in crime statistics.1 In theory, citizens report crimes to the police, the police keep records of those reports, and those records become the basis for calculating crime rates. But some crimes are not reported (because people are too afraid or too busy to call the police, or because they doubt the police will be able to do anything useful), and the police may not keep records of all the reports they receive, so the crime rate inevitably underestimates the actual amount of crime. The difference between the number of officially recorded crimes and the true number of crimes is the dark figure.
Every social problem has a dark figure because some instances (of crime, child abuse, poverty, or whatever) inevitably go unrecorded. How big is the dark figure? When we first learn about a problem that has never before received attention, when no one has any idea how common the problem actually is, we might think of the dark figure as being