Yong Chen

Industrial Data Analytics for Diagnosis and Prognosis


Скачать книгу

straight x subscript ij minus straight x with bar on top subscript straight j right parenthesis left parenthesis straight x subscript ik minus straight x subscript straight k right parenthesis over denominator straight n minus 1 end fraction."/> (2.5)

      bold S equals fraction numerator 1 over denominator n minus 1 end fraction sum from i equals 1 to n of left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis to the power of T. (2.6)

      Similarly, we define the sample correlation matrix as

bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell r subscript 12 end cell horizontal ellipsis cell r subscript 1 p end subscript end cell row cell r subscript 21 end cell 1 horizontal ellipsis cell r subscript 2 p end subscript end cell row vertical ellipsis vertical ellipsis blank vertical ellipsis row cell r subscript n 1 end subscript end cell cell r subscript n 2 end subscript end cell horizontal ellipsis 1 end table close parentheses.

      The (j, k)th element of R is the sample correlation of the jth and kth variables:

r subscript j k end subscript equals fraction numerator s subscript j k end subscript over denominator s subscript j s subscript k end fraction.

      The sample correlation between a variable and itself is equal to 1. So the diagonal elements of a sample correlation matrix are all equal to 1. The sample correlation matrix R is obviously symmetric since rjk = rkj.

      Example 2.4 Consider the data set in Table 2.1. In Example 2.2, we found that 1 = 2479.5 and 2 = 170.35. Similarly, we can obtain 3 = 65.41. So the mean vector of x = (x1 x2 x3)T is given by

bold x with bold bar on top equals left parenthesis 2479.5 text end text 170.35 text end text 65.41 right parenthesis to the power of T.

      In Example 2.2, we calculated the sample variances, sample covariance, and sample correlation of x1 and x2. Similarly, we can obtain the sample variance of x3 and its sample covariance and correlation with the other two variables as

s subscript 3 superscript 2 equals 3.71 comma space s subscript 13 equals 820.8 comma space space s subscript 23 equals 15.56 comma space r subscript 13 equals 0.832 comma space r subscript 23 equals 0.881.

      Note that while s23 is much smaller than s13, r23 is greater than r13, which indicates that the linear association between x2 and x3 is stronger than that of x1 and x3. This clearly shows that the magnitude of the covariance itself is not meaningful in characterizing how strong the relationship of two variables is. Combining all the sample variance, covariance, and correlation information, the sample covariance matrix and sample correlation matrix of x = (x1 x2 x3)T can be written as

bold S equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell 262829.2 end cell cell 4316.8 end cell cell 820.8 end cell row cell 4316.8 end cell cell 84.07 end cell cell 15.56 end cell row cell 820.8 end cell cell 15.56 end cell cell 3.71 end cell end table close parentheses comma space of 1em bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell 0.918 end cell cell 0.832 end cell row cell 0.918 end cell 1 cell 0.881 end cell row cell 0.832 end cell cell 0.881 end cell 1 end table close parentheses.

      We are often interested in some linear combinations of the variables x1, x2,…, xp. For example, for the auto_spec data set, two of the variables are city.mpg and highway.mpg. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg and highway.mpg. In general, let c1, c2,…, cp be constants and consider the linear combination of the variables x1, x2,…, xp given by

z equals c subscript 1 x subscript 1 plus c subscript 2 x subscript 2 plus horizontal ellipsis plus c subscript p x subscript p.

      For each observation of the data set, the corresponding value of the variable z can be found by

z subscript i equals c subscript 1 x subscript i 1 end subscript plus c subscript 2 x subscript i 2 end subscript plus horizontal ellipsis plus c subscript p x subscript i p end subscript equals bold italic C to the power of bold italic T bold X subscript bold i comma i equals 1 comma horizontal ellipsis comma p comma

      where cT = (c1 c2cp). It can be seen that the sample mean of z is

      The sample variance of z can be found as

      Because sample variance is always non-negative, for any cℛp we have cT Sc ≥ 0 from (2.8). Therefore, the sample covariance matrix S is always a positive semidefinite matrix.

      In general, if we have q linear combinations of x1, x2,…, xp defined