Richard J. Rossi

Applied Biostatistics for the Health Sciences


Скачать книгу

      where the X values for the N units in the population are X1,X2,X3,…,XN.

       Example 2.16

      The distribution given below has a long tail to the right.

22 comma 24 comma 25 comma 27 comma 28 comma 28 comma 31 comma 32 comma 33 comma 35 comma 39 comma 41 comma 670

      In a previous example, µ was computed to be 79.63. The geometric mean for this population is

left-parenthesis 22 times 24 times 25 times 27 times 28 times 28 times 31 times 32 times 33 times 35 times 39 times 41 times 670 right-parenthesis Superscript one-thirteenth Baseline equals 29.4

      Thus, even though there is an extremely large and atypical value in this population, the geometric mean is not sensitive to this value and is a more reasonable parameter for representing the typical value in this population. In fact, the geometric mean and median are very close for this population with GM = 29.4 and μ~=28.

      2.2.5 Measures of Dispersion

      Figure 2.16 Two different populations having the same mean, median, and mode.

      Even though the mean, median, and mode of these two populations are the same, clearly, population I is much more spread out than population II. The density of population II is greater at the mean, which means that population II is more concentrated at this point than population I.

      When describing the typical values in the population, the more variation there is in a population the harder it is to measure the typical value, and just as there are several ways of measuring the center of a population there are also several ways to measure the variation in a population. The three most commonly used parameters for measuring the spread of a population are the variance, standard deviation, and interquartile range. For a quantitative variable X

       the variance of a population is defined to be the average of the squared deviations from the mean and will be denoted by σ2 or Var(X). The variance of a variable X measured on a population consisting of N units is

       the standard deviation of a population is defined to be the square root of the variance and will be denoted by σ or SD(X).

       the interquartile range of a population is the distance between the 25th and 75th percentiles and will be denoted by IQR.

      Note that each of these measures of spread is a positive number except in the rare case when there is absolutely no variation in the population, in which case they will all be equal to 0. Furthermore, the larger each of these values is the more variability there is in the population. For example, for the two populations in Figure 2.16 the standard deviation of population I is larger than the standard deviation of population II.

      Because the standard deviation is the square root of the variance, both σ and σ2 contain equivalent information about the variation in a population. That is, if the variance is known, then so is the standard deviation and vice versa. For example, if Var(X)=σ2=25, then the standard deviation is σ=25=5, and if SD(X)=σ=20, then Var(X)=σ2=202=400. The standard deviation is generally used for describing the variation in a population because the units of the standard deviation are the same as the units of the variable; the units of the variance are the units of the variable squared. Also, the standard deviation is roughly the size of a typical deviation from the mean of the population. For example, if X is a variable measured in cubic centimeters (cc), then the standard deviation is also measured in cc’s but the variance will be measured in cc2 units.

      Figure 2.17 IQR is the distance between X75 and X25.

      Like the median, the interquartile range is unaffected by the extremes in a population. On the other hand, the standard deviation and variance are heavily influenced by the extremes in a population. The shape of the distribution influences the parameters of a distribution and dictates which parameters provide meaningful descriptions of the characteristics of a population. However, for a mound-shaped distribution, the standard deviation and interquartile range are closely related with σ≈0.75⋅ IQR.

      Consider the two populations listed below that were used in Example 2.14.

StartLayout 1st Row 1st Column Blank 2nd Column Population 1 colon 22 comma 24 comma 25 comma 27 comma 28 comma 28 comma 31 comma 32 comma 33 comma 35 comma 39 comma 41 comma 67 2nd Row 1st Column Blank 2nd Column Population 2 colon 22 comma 24 comma 25 comma 27 comma 28 comma 28 comma 31 comma 32 comma 33 comma 35 comma 39 comma 41 comma 670 EndLayout

      Again, these two populations are identical except for their largest values, 67 and 670. In Example 2.17, the mean values of populations 1 and 2 were found to be μ1=33.23 and μ2=79.63. The variances of these two populations are σ12=134.7 and σ22=31498.4, and the standard deviations are σ1=134.7=11.6 and σ2=31498.4=177.5. By changing the maximum value in the population from 67 to 670, the standard deviation increased by a factor of 15. In both populations, the 25th and 75th percentiles are 26 and 37, respectively, and thus, the interquartile range for both populations is IQR =37−26=11.

      Figure 2.18 The one-standard deviation empirical rule; roughly 68% of a mound-shaped distribution lies between the values μ−σ and μ+σ.

      Figure 2.19 The two-standard deviation empirical rule; roughly 95% of a mound-shaped distribution lies between the values μ−2σ and μ+2σ.

      Figure