Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen. Читать онлайн. Hotlib. HOTLIB.NET

Yong Chen

Industrial Data Analytics for Diagnosis and Prognosis

The covariance matrix of Z = CX is

begin inline style sum for bold z of equals cov open parentheses bold Z close parentheses equals cov open parentheses bold CX close parentheses equals bold C bold sum for bold x of bold C to the power of bold T bold. end style (3.2)

The similarity of (3.2) and (2.10) is pretty clear. When C is a row vector c^T = (c₁, c₂,…, cp), CX = c^TX = c₁X₁ + … + cp Xp and

begin inline style table row cell bold E bold left parenthesis bold c to the power of bold T bold X bold right parenthesis end cell cell bold equals bold c to the power of bold T bold mu end cell end table end style (3.3)

begin inline style table row cell bold var bold left parenthesis bold c to the power of bold T bold X bold right parenthesis end cell cell bold equals bold c to the power of bold T bold sum bold c end cell end table end style (3.4)

where μ and Σ are the mean vector and covariance matrix of X.

Let X₁ and X₂ denote two subvectors of X, i.e., bold X equals open parentheses table row cell bold X subscript bold 1 end cell row cell bold X subscript bold 2 end cell end table close parentheses . The mean vector and the covariance matrix of X can be partitioned as

(3.5)

(3.6)

where Σ₁₁ = cov(X₁) and Σ₂₂ = cov(X₂). The matrix Σ₁₂ contains the covariance of each component in X₁ and each component in X₂. Based on the symmetry of Σ, we have capital sigma subscript 21 equals capital sigma subscript 12 superscript T .

3.2 Density Function and Properties of Multivariate Normal Distribution

Normal distribution is the most commonly used distribution for continuous random variables. Many statistical models and inference methods are based on the univariate or multivariate normal distribution. One advantage of the normal distribution is its mathematical tractability. More importantly, the normal distribution turns out to be a good approximation to the “true” population distribution for many sample statistics and real-world data due to the central limit theorem, which says that the summation of a large number of independent observations from any population with the same mean and variance approximately follows a normal distribution.

Recall that a univariate random variable X with mean μ and variance σ² is normally distributed, which is denoted by X ∼ N (μ, σ²), if it has the probability density function

$table row cell f left parenthesis x right parenthesis equals fraction numerator 1 over denominator square root of 2 pi sigma squared end root end fraction e to the power of negative left parenthesis x minus mu right parenthesis squared divided by 2 sigma squared end exponent comma text end text minus infinity less than x less than infinity. end cell end table$ (3.7)

The multivariate normal distribution is an extension of the univariate normal distribution. If a p-dimensional random vector X follows a multivariate normal distribution with mean vector μ and covariance matrix Σ, the probability density function of X has the form

$table row cell f left parenthesis bold x right parenthesis equals fraction numerator 1 over denominator left parenthesis 2 pi right parenthesis to the power of p divided by 2 end exponent vertical line capital sigma vertical line to the power of 1 divided by 2 end exponent end fraction e to the power of negative left parenthesis bold x minus bold italic mu right parenthesis to the power of T capital sigma to the power of negative 1 end exponent left parenthesis bold x minus bold italic mu right parenthesis end exponent. end cell end table$ (3.8)

We denote the p-dimensional normal distribution by Np(μ, Σ).

From (3.8), the density of a p-dimensional normal distribution depends on x through the term (x − μ)^T Σ⁻¹ (x − μ), which is the square of the distance from x to Σ standardized by the covariance matrix. Then it is clear that the set of x values yielding a constant height for the density form an ellipsoid. The set of points with the same height for the density is called a contour. The constant probability density contour of a p-dimensional normal distribution is:

left curly bracket bold x vertical line left parenthesis bold x minus bold italic mu right parenthesis to the power of T capital sigma to the power of negative 1 end exponent left parenthesis bold x minus bold italic mu right parenthesis equals c squared right curly bracket comma

which forms the surface of an ellipsoid centered at μ with standardized distance between x and μ equal to c. And the contour with larger distance c has a smaller height value for the density. It can be shown that the axes of the ellipsoid contours of constant density for the p-dimensional normal distribution are in the directions of the eigenvectors of Σ with lengths proportional to the square roots of the corresponding eigenvalues of Σ.

Example 3.1: Consider a bivariate (p = 2) normally distributed random vector X = (X₁ X₂)T. Suppose the mean vector is μ = (0 0)T and the covariance matrix is

table row cell bold capital sigma equals open parentheses table row 1 rho row rho 1 end table close parentheses. end cell end table

So the variance of both variables is equal to one and the covariance matrix coincides with the correlation matrix. The inverse of the covariance matrix

Скачать книгу