alt="table attributes columnalign left end attributes row cell x subscript 1 equals text curb.weight end text end cell row cell x subscript 2 equals text length end text end cell row cell x subscript 3 equals text width end text end cell end table"/>
To obtain the sample covariance for the variables curb.weight
and length
in the data set in Table 2.1, we first calculate the sample means x̄1, x̄2, and
By (2.2), the sample covariance of the two variables can be obtained as
The s12 value of 4316.8 itself cannot tell us whether the two variables have a strong or weak (linear) relationship. Such information can be provided by the correlation. To evaluate the sample correlation, we first need the sample variance of x1 and x2. By (2.1), we have
By (2.4), we have
which is close to 1 and corresponding to a strong positive linear association between the curb weight and length of cars.
Example 2.3 In R
, the sample mean, variance, covariance, and correlation can be found using functions mean()
, var()
, cov()
, and cor()
, respectively. For example, the following R
codes can be used to find the sample mean and sample variance of curb.weight
, and the sample covariance and correlation between curb.weight
and length
, in the auto.spec
data set.
mean(auto.spec.df$curb.weight) var(auto.spec.df$curb.weight) with(auto.spec.df, cov(curb.weight, length)) with(auto.spec.df, cor(curb.weight, length))> mean(auto.spec.df$curb.weight) [1] 2555.566 > var(auto.spec.df$curb.weight) [1] 271107.9 > with(auto.spec.df, cov(curb.weight, length)) [1] 5638.336 > with(auto.spec.df, cor(curb.weight, length)) [1] 0.8777285
Note the results above are somewhat different from those in Example 2.2 because in this example we use the entire data set of auto.spec
, instead of a small random subset of it as in Example 2.2.
2.2.2 Sample Mean Vector and Sample Covariance Matrix
A multivariate data set consists of n observations collected from n items or units and each observation contains measurements on p variables, x1, x2,…, xp. The measurement vector for the ith observation is denoted by
The sample mean vector is the vector of sample means for the p variables, which is defined as
where x̄k is the sample mean of
The sample covariance matrix S is the matrix of sample variances and covariances of the p variables:
The off-diagonal elements of S is the sample covariances of each pair of variables. For j ≠ k,