where β2 is the variance of X. An estimator of a parameter is called unbiased if its mean is equal to the true value of the parameter. X̄ is a commonly used estimator of µ because it is unbiased and has a smaller variance for a larger sample size n.
This concept can be extended to a p-dimensional random vector X with mean vector µ. Consider a random sample X1, X2,…, Xn from the population of X. The sample mean vector X̄ is a random vector with population mean E(X̄) = µ and population covariance matrix
The (population) covariance matrix of a random vector X is defined as
The ith diagonal element of Σ is the population variance of Xi:
The (j,k)th off-diagonal element of Σ is the population covariance between Xj and Xk:
where fjk(xj, xk) and pjk(xj, xk) are the joint density function and joint probability mass function, respectively, of Xj and Xk. The population covariance measures the linear association between the two random variables. It is clear that σi = σkj and the covariance matrix Σ is symmetric. The same as the sample covariance matrix, the population covariance matrix Σ is always positive semidefinite.
Similar to the population mean, the population variance and covariance can be estimated by the sample variance and covariance introduced in Section 2.2. The sample variance and covariance are both random variables, and are unbiased estimators of the population variance and covariance. Consequently, the sample covariance matrix S is an unbiased estimator of the population covariance matrix Σ, that is, E(S) = Σ.
As for the sample covariance, the value of the population covariance of two random variables depends on the scaling, possibly due to the difference of measuring unit of the variables. A scaling-independent measure of the degree of linear association between the random variables Xj and Xk is given by the population correlation:
It is clear that ρjk = ρkj. And the population correlation matrix of a random vector X is a symmetric matrix defined as
For univariate variables X and Y and a constant c, we have E(X + Y) = E(X) + E(Y) and E(cX) = cE(X). Similarly, for random vectors X and Y and a constant matrix C, it can be seen that