Yong Chen

Industrial Data Analytics for Diagnosis and Prognosis


Скачать книгу

alt="table attributes columnalign left end attributes row cell x subscript 1 equals text curb.weight end text end cell row cell x subscript 2 equals text length end text end cell row cell x subscript 3 equals text width end text end cell end table"/>

      To obtain the sample covariance for the variables curb.weight and length in the data set in Table 2.1, we first calculate the sample means 1, 2, and sum from i equals 1 to n of x subscript i 1 end subscript x subscript i 2 end subscript as:

s squared equals fraction numerator begin display style sum subscript i equals 1 end subscript superscript n left parenthesis x subscript i minus x with bar on top right parenthesis squared end style over denominator n minus 1 end fraction equals fraction numerator begin display style sum subscript i equals 1 end subscript superscript n x subscript i superscript 2 minus n x with bar on top squared end style over denominator n minus 1 end fraction. sum from i equals 1 to n of x subscript i 1 end subscript x subscript x 2 end subscript equals left parenthesis 3515 right parenthesis left parenthesis 190.9 right parenthesis plus left parenthesis 2300 right parenthesis left parenthesis 168.7 right parenthesis plus midline horizontal ellipsis plus left parenthesis 1909 right parenthesis left parenthesis 158.8 right parenthesis equals 4262679.

      By (2.2), the sample covariance of the two variables can be obtained as

table attributes columnalign left end attributes row cell s subscript 12 equals fraction numerator begin display style sum subscript i equals 1 end subscript superscript n x subscript i 1 end subscript x subscript i 2 end subscript minus n x with bar on top subscript 1 x with bar on top subscript 2 end style over denominator n minus 1 end fraction end cell row cell equals fraction numerator 4262679 minus left parenthesis 10 right parenthesis left parenthesis 2479.5 right parenthesis left parenthesis 170.35 right parenthesis over denominator 9 end fraction equals 4316.8. end cell end table table attributes columnalign left end attributes row cell s subscript 1 superscript 2 equals fraction numerator begin display style sum from i equals 1 to n of x subscript i 1 end subscript superscript 2 minus n x with bar on top subscript 1 superscript 2 end style over denominator n minus 1 end fraction equals fraction numerator 63 844 665 minus left parenthesis 10 right parenthesis left parenthesis 2479.5 right parenthesis squared over denominator 9 end fraction equals 262 829.2 comma end cell row cell s subscript 2 superscript 2 equals fraction numerator begin display style sum from i equals 1 to n of x subscript i 2 end subscript superscript 2 minus n x with bar on top subscript 2 superscript 2 end style over denominator n minus 1 end fraction equals fraction numerator 290 947.8 minus left parenthesis 10 right parenthesis left parenthesis 170.35 right parenthesis squared over denominator 9 end fraction equals 84.07. end cell end table

      By (2.4), we have

r subscript 12 equals fraction numerator begin display style s subscript 12 end style over denominator begin display style s subscript 1 s subscript 2 end style end fraction equals fraction numerator begin display style 4316.8 end style over denominator begin display style square root of 262829.2 end root square root of 84.07 end root end style end fraction equals 0.918 comma

      which is close to 1 and corresponding to a strong positive linear association between the curb weight and length of cars.

      Example 2.3 In R, the sample mean, variance, covariance, and correlation can be found using functions mean(), var(), cov(), and cor(), respectively. For example, the following R codes can be used to find the sample mean and sample variance of curb.weight, and the sample covariance and correlation between curb.weight and length, in the auto.spec data set.

      mean(auto.spec.df$curb.weight) var(auto.spec.df$curb.weight) with(auto.spec.df, cov(curb.weight, length)) with(auto.spec.df, cor(curb.weight, length))> mean(auto.spec.df$curb.weight) [1] 2555.566 > var(auto.spec.df$curb.weight) [1] 271107.9 > with(auto.spec.df, cov(curb.weight, length)) [1] 5638.336 > with(auto.spec.df, cor(curb.weight, length)) [1] 0.8777285

      2.2.2 Sample Mean Vector and Sample Covariance Matrix

      A multivariate data set consists of n observations collected from n items or units and each observation contains measurements on p variables, x1, x2,…, xp. The measurement vector for the ith observation is denoted by

x subscript i equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell x subscript i 1 end subscript end cell row cell x subscript i 2 end subscript end cell row vertical ellipsis row cell x subscript i p end subscript end cell end table close parentheses.

      The sample mean vector is the vector of sample means for the p variables, which is defined as

x with bar on top equals left parenthesis table row cell x with bar on top subscript 1 end cell row cell x with bar on top subscript 2 end cell row vertical ellipsis row cell x with bar on top subscript p end cell end table right parenthesis equals 1 over n sum from i equals 1 to n of x subscript i comma

      where x̄k is the sample mean of x subscript k comma i. e. comma x with bar on top subscript i equals 1 over n sum subscript i equals 1 end subscript superscript n x subscript i k end subscript comma k equals 1 comma horizontal ellipsis comma p.

      The sample covariance matrix S is the matrix of sample variances and covariances of the p variables:

bold S bold equals open parentheses table row cell bold S subscript bold 11 end cell cell bold S subscript bold 12 end cell bold midline horizontal ellipsis cell bold S subscript bold 1 bold p end subscript end cell row cell bold S subscript bold 21 end cell cell bold S subscript bold 22 end cell bold midline horizontal ellipsis cell bold S subscript bold 2 bold p end subscript end cell row bold vertical ellipsis bold vertical ellipsis blank bold vertical ellipsis row cell bold S subscript bold p bold 1 end subscript end cell cell bold S subscript bold p bold 2 end subscript end cell bold midline horizontal ellipsis cell bold S subscript bold p bold p end subscript end cell end table close parentheses

      The off-diagonal elements of S is the sample covariances of each pair of variables. For j ≠ k,