Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics


Скачать книгу

4 (8.5) Halloween 2 (9.5) 2 (9.6)

      Actual scores on the favorability measure are in parentheses.

      > bill <- c(5, 1, 3, 4, 2) > mary <- c(5, 3, 1, 4, 2)

      Because the data are already in the form of ranks, both Pearson r and Spearman rho will agree:

      > cor(bill, mary) [1] 0.6 > cor(bill, mary, method = “spearman”) > 0.6

      Note that by default, R returns the Pearson correlation coefficient. One has to specify method = “spearman” to get rs. Consider now what happens when we correlate, instead of rankings, the actual subjective favorability scores corresponding to the respective ranks. When we plot the favorability data, we obtain:

      > bill.sub <- c(2.1, 7.6, 8.4, 9.5, 10.0) > mary.sub <- c(7.6, 8.5, 9.0, 9.6, 9.7) > plot(mary.sub, bill.sub)Graph depicts the plot of mary.xub versus bill.xub.

      Note that though the relationship is not perfectly linear, each increase in Bill's subjective score is nonetheless associated with an increase in Mary's subjective score. When we compute Pearson's r on this data, we obtain:

      > cor(bill.sub, mary.sub) [1] 0.9551578

      However, when we compute rs, we get:

      > cor(bill.sub, mary.sub, method = "spearman") [1] 1

      The density for Student's t is given by (Shao, 2003):

equation

      The fact that t converges to z for large degrees of freedom but is quite distinct from z for small degrees of freedom is one reason why t distributions are often used for small sample problems. When sample size is large, and so consequently are degrees of freedom, whether one treats a random variable as t or z will make little difference in terms of computed p‐values and decisions on respective null hypotheses. This is a direct consequence of the convergence of the two distributions for large degrees of freedom. For a historical overview of how t‐distributions came to be, consult Zabell (2008).

      2.20.1 t‐Tests for One Sample

      When we perform hypothesis testing using the z distribution, we assume we have knowledge of the population variance σ2. Having direct knowledge of σ2 is the most ideal and preferable of circumstances. When we know σ2, we can compute the standard error of the mean directly as

equation Graph depicts student's t versus normal densities for 3 (left), 10 (middle), and 50 (right) degrees of freedom. As degrees of freedom increase, the limiting form of the t distribution is the z distribution. equation

      where the numerator images represents the distance between the sample mean and the population mean μ0 under the null hypothesis, and the denominator images is the standard error of the mean.

      In most research contexts, from simple to complex, we usually do not have direct knowledge of σ2. When we do not have knowledge of it, we use the next best thing, an estimate of it. We can obtain an unbiased estimate of σ2 by computing s2 on our sample. When we do so, however, and use s2 in place of σ2, we can no longer pretend to “know” the standard error of the mean. Rather, we must concede that all we are able to do is estimate it. Our estimate of the standard error of the mean is thus given by:

equation

      When we use s2 (where images) in place of σ2, our resulting statistic is no longer a z statistic. That is, we say the ensuing statistic is no longer distributed as a standard normal variable (i.e., z). If it is not distributed as z, then what is it distributed as? Thanks to William Sealy Gosset who in 1908 worked for Guinness Breweries under the pseudonym “Student” (Zabell, 2008), the ratio

equation

      was found to be distributed as a t statistic on n − 1 degrees of freedom. Again, the t distribution is most useful for when sample sizes are rather small. For larger samples, as mentioned, the t distribution converges to that of the z distribution. If you are using rather large samples, say approximately 100 or more, whether you evaluate your null hypothesis using a z or t distribution will not matter much, because the critical values for z and t for such degrees of freedom (99 for the one‐sample case) will be relatively alike, that practically at least, the two test statistics can be considered more or less equal. For even larger samples, the convergence is that much more fine‐tuned.

      The concept of convergence between z and t can be easily illustrated by inspecting the variance of the t distribution. Unlike the z distribution where the variance is set at 1.0 as a constant, the variance of the t distribution is defined as: