Example 2.10
One of the goals of the 1989 Wisconsin Behavioral Risk Factor Surveillance System (BRFS) was to estimate the distribution of adults who count calories. The distribution of male and female adults in Wisconsin who count calories is given in Table 2.3. Based on the information in Table 2.3, the percentage of females who do not count calories is 69.6% and the percentage of males who do not count calories is 84.8%. Note that there are actually two distributions given in Table 2.3.
Table 2.3 The Distribution of Adults who Count Calories Based on the 1989 Wisconsin BRFS by Age and Gender
Sex | Calories Eaten Per Day | ||
---|---|---|---|
% 1200 or Less | % > 1200 | % Do Not Count | |
Male | 4.6 | 10.6 | 84.8 |
Female | 19.0 | 11.5 | 69.6 |
The distribution of a continuous quantitative variable is often modeled with a mathematical function called the probability density function. The probability density function explicitly describes the distribution of the values of the variable. A plot of the probability density function provides a graphical representation of the distribution of a variable, and the area under the curve defined by the probability density function corresponds to the percentage of the population falling between these two values. The height of the curve at a particular value of the variable measures the percentage per unit in the distribution at this point and is called the density of the population at this point. Regions where the values of the variable are more densely grouped are areas in the graph of a probability density function where it is tallest. Examples of the most common shapes of the distribution of a continuous variable are given in Figures 2.5–2.8.
Figure 2.5 An example of a mound-shaped distribution.
Figure 2.6 An example of a distribution with a long tail to the right.
Figure 2.7 An example of a distribution with a long tail to the left.
Figure 2.8 An example of a bimodal distribution.
The value of the population under the peak of a probability density graph is called a mode. A distribution can have more than one mode, and a distribution with more than one mode is called a multimodal distribution. When a distribution has two or more modes, this usually indicates that there are distinct subpopulations clustering around each mode. In this case, it is often more informative to have separate graphs of the probability distributions for analyzing each of the subpopulations.
Example 2.11
In studying obsessive compulsive disorder (OCD), the age at onset is an important variable that is believed to be related to the neurobiological features of OCD; OCD is classified as being either Child Onset OCD or Adult Onset OCD. In the article “Is age at symptom onset associated with severity of memory impairment in adults with obsessive-compulsive disorder?” published in the American Journal of Psychiatry (Henin et al., 2001), the authors reported the distribution of the age for onset of OCD given in Figure 2.9. Because there are two modes (peaks) in Figure 2.9, the distribution is suggesting that there might be two different distributions for the age of onset of OCD, one for children and one for adults. Because the clinical diagnoses are Child Onset OCD and Adult Onset OCD, it is more informative to study each of these subpopulations separately. Thus, the distribution of age of onset of OCD has been separated into distributions for the distinct classifications as Child Onset OCD and Adult Onset OCD that are given in Figure 2.10.
Figure 2.9 Distribution of age at which OCD is diagnosed.
Figure 2.10 Distribution of the age at which OCD is diagnosed for Child Onset OCD and Adult Onset OCD.
The shape of the distribution of a discrete variable can also be described as long-tail right, mound shaped, long-tail left, or multimodal. For example, the 2005 National Health Interview Survey (NHIS) reports the distribution of the size of a family, a discrete variable, and the distribution according to the 2005 National Health Interview Survey is given in Figure 2.11. Note that the distribution of family size according to the 2005 NHIS data is a long-tail right discrete distribution.
Figure 2.11 Distribution of family size according to the 2005 NHIS.
2.2.2 Describing a Population with Parameters
Because the distribution of a variable contains all of the information on how the units in the population are distributed, every question concerning the target population can be answered by studying the distribution of the target population. An alternative method of describing a population is to summarize specific characteristics of the population. That is, the target population can be summarized by determining the values of specific parameters such as the parameters that measure the typical value in the population, population percentages, the spread of the population, and the extremes of a population.
2.2.3 Proportions and Percentiles
Populations are often summarized by listing the important percentages or proportions associated with the population. The proportion of units in a population having a particular characteristic is a parameter of the population, and a population proportion will be denoted by p. The population proportion having a particular characteristic, say characteristic A, is defined to be
Note that the percentage of the population having characteristic A is p×100%. Population proportions and percentages are often associated with the categories of a qualitative variable or with