Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics


Скачать книгу

href="#fb3_img_img_a92f08e4-ddf0-55dd-b374-a4dfc853d84c.png" alt="equation"/>

      The matched-pairs design is a very important concept in statistics and design of experiments, because this simple design is the starting point to understanding more complicated designs and modeling such as mixed effects and hierarchical models.

      We analyze the hypothetical data in Table 2.8 using a paired samples t‐test in R by requesting paired = TRUE :

      > treat <- c(10, 15, 20, 22, 25) > control <- c(8, 12, 14, 15, 24) > t.test(treat, control, paired = TRUE) Paired t-test data: treat and control t = 3.2827, df = 4, p-value = 0.03042 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.5860324 7.0139676 sample estimates: mean of the differences 3.8

      The obtained p‐value of 0.03 is statistically significant at a 0.05 level of significance. We reject the null hypothesis and conclude the population means for the treatment conditions to be different.

      As a nonparametric test, the Wilcoxon rank‐sum test featured earlier can be adapted to incorporate paired observations. For our data, we have:

      > wilcox.test(treat, control, paired = TRUE) Wilcoxon signed rank test data: treat and control V = 15, p-value = 0.0625 alternative hypothesis: true location shift is not equal to 0

Treatment 1 Treatment 2 Treatment 3
Block 1 10 9 8
Block 2 15 13 12
Block 3 20 18 14
Block 4 22 17 15
Block 5 25 25 24

      Now, here is the trick to understanding advanced modeling, including a primary feature of mixed effects modeling. We know that we expect the covariance between treatments to be unequal to 0. This is analogous to what we expected in the simple matched-pairs design. It seems then that a reasonable assumption to make for the data in Table 2.9 is that the covariances between treatments are equal, or at minimum, follow some hypothesized correlational structure. In multilevel and hierarchical models, attempts are made to account for the correlation between treatment levels instead of assuming these correlations to equal 0 as is the case for classical between‐subjects designs. In Chapter 6, we elaborate on these ideas when we discuss randomized block and repeated measures models.

      In many statistical techniques, especially multivariate ones, statistical analyses take place not on individual variables, but rather on linear combinations of variables. A linear combination in linear algebra can be denoted simply as:

equation

      where a ' = (a1, a2, …, ap). These values are scalars, and serve to weight the respective values of y1 through yp, which are the variables.

      Just as we did for “ordinary” variables, we can compute a number of central tendency and dispersion statistics on linear combinations. For instance, we can compute the mean of a linear combination ℓi as

equation

      We can also compute the sample variance of a linear combination:

equation

      For two linear combinations,

equation

      and

equation

      we can obtain the sample covariance between such linear combinations as follows:

equation

      The correlation of these linear combinations (Rencher and Christensen, 2012, p. 76) is simply the standardized version of images:

equation

      As we will see later in the book, if images is the maximum correlation between linear combinations on the same variables, it is called the canonical correlation, discussed in Chapter 12. The correlation between linear combinations plays a central role in multivariate analysis. Substantively, and geometrically, linear combinations can be interpreted as “projections” of one or more variables onto new dimensions. For instance, in simple linear regression, the fitting of a least‐squares line is such a projection. It is the projection of points such that it guarantees that the sum of squared deviations from the given projected line or “surface” (in the case of higher dimensions) is kept to a minimum.

      If we can assume multivariate normality of a distribution, that is, YN[μ, ], then we know linear combinations of Y are also normally distributed, as well as a host of other useful statistical properties (see Timm, 2002, pp. 86–88). In multivariate methods especially, we regularly need to make assumptions about such linear combinations,