Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen. Читать онлайн. Hotlib. HOTLIB.NET

Yong Chen

Industrial Data Analytics for Diagnosis and Prognosis

Скачать книгу

or in matrix notation,

bold z equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell straight z subscript 1 end cell row cell straight z subscript 2 end cell row straight vertical ellipsis row cell straight z subscript straight k end cell end table close parentheses equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell straight c subscript 11 end cell cell straight c subscript 12 end cell horizontal ellipsis cell straight c subscript 1 straight p end subscript end cell row cell straight c subscript 21 end cell cell straight c subscript 22 end cell horizontal ellipsis cell straight c subscript 2 straight p end subscript end cell row straight vertical ellipsis straight vertical ellipsis blank straight vertical ellipsis row cell straight c subscript straight q 1 end subscript end cell cell straight c subscript straight q 2 end subscript end cell horizontal ellipsis cell straight c subscript qp end cell end table close parentheses open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell straight x subscript 1 end cell row cell straight x subscript 2 end cell row straight vertical ellipsis row cell straight x subscript straight p end cell end table close parentheses equals bold Cx.

The sample mean vector and sample covariance matrix of

bold z subscript straight i equals bold Cx subscript straight i comma space of 1em straight i equals 1 comma 2 comma horizontal ellipsis comma straight n

are given by

straight z with bar on top equals straight C straight x with bar on top (2.9)

straight S subscript straight z equals CSC to the power of straight T (2.10)

Obviously, (2.9) and (2.10) are generalizations of (2.7) and (2.8), respectively.

Example 2.5 For the auto.spec data set, using the mean() function of R the sample means of the variables city.mpg and highway.mpg can be found as 25.22 and 30.75, respectively. If we are interested in the overall MPG of a car, denoted by z, as the following weighted average of x₁ = city.mpg and x₂ = highway.mpg:

z equals 0.4 x subscript 1 plus 0.6 x subscript 2 equals bold c to the power of T open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell x subscript 1 end cell row cell x subscript 2 end cell end table close parentheses comma

where c = (0.4 0.6)T. Then by (2.7) the sample mean of the overall MPG in the data set is

z with bar on top equals bold c to the power of bold T bold x with bold bar on top equals open parentheses 0.4 space 0.6 close parentheses open parentheses table row cell 25.22 end cell row cell 30.75 end cell end table close parentheses equals 28.54.

To find the sample variance of z, first we obtain the sample covariance matrix for city.mpg and highway.mpg using the cov() function of R:

cov(auto.spec.df[, c("city.mpg", "highway.mpg")]) cor(auto.spec.df[, c("city.mpg", "highway.mpg")])

The function cor() calculates the sample correlation matrix. Based on the output from the above R codes, we have

bold S equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell 42.8 end cell cell 43.76 end cell row cell 43.76 end cell cell 47.42 end cell end table close parentheses comma space of 1em bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell 0.971 end cell row cell 0.971 end cell 1 end table close parentheses.

By (2.8), the sample variance of z is

straight s subscript straight z superscript 2 equals bold c to the power of straight T bold Sc equals left parenthesis 0.4 text end text 0.6 right parenthesis left parenthesis table row cell 42.8 end cell cell 43.76 end cell row cell 43.76 end cell cell 47.42 end cell end table right parenthesis left parenthesis 0.4 text end text 0.6 right parenthesis to the power of straight T equals 44.9.

Bibliographic Notes

Data visualization methods are discussed in books in the data mining area, for example, Shmueli et al. [2017] and Williams [2011]. In this chapter, we mostly use the graphics functions from base R. A popular dedicated graphics package in R is the ggplot2 package by Wickham [2016]. The ggplot2 package provides more flexible and powerful graphics capability that can create presentation-quality visualization. However, it also comes with a significant learning curve to get familiar with the special technical language used in ggplot2. For those who use data visualizations on a regular basis, it is worth the time and effort to learn ggplot2.

Sample statistics such as sample mean vector and sample covariance matrix for multivariate observations are discussed in detail in many multivariate statistics books, for example, Johnson et al. [2002] and Rencher [2003].

Exercises

1 Consider the data in the following table with two numerical variables x1 and x2 and two categorical variables x3 and x4.

x₁	x₂	x₃	x₄
9	1	Yes	On
5	3	No	Off
1	2	Yes	Off
3	4	Yes	On
6	−1	No	On
3	3	Yes	On

1 Manually sketch the scatter plot for x1 and x2.Manually sketch the mosaic plot for x3 and x4.

1 Consider the data set in Exercise 1. Manually calculate the sample mean vector, the sample covariance matrix, and the sample correlation matrix of x = (x1 x2)T.

2 Consider the data in the following table with two numerical variables x1 and x2 and two categorical variables x3 and x4.

x₁	x₂	x₃	x₄
1	0	Yes	Working
4	6	No	Fail
2	2	Yes	Fail
0	3	No	Fail
3	4	No	Working
5	7	Yes	Working

Скачать книгу