S‐wave velocity and the variable Y could represent P‐wave velocity. If a direct measurement of P‐wave velocity is available, we can compute the posterior probability distribution of S‐wave velocity conditioned on the P‐wave velocity measurement. The prior distribution is assumed to be unimodal with relatively large variance. By integrating the likelihood function, we reduce the uncertainty in the posterior distribution.
Figure 1.5 Bayes' theorem: the posterior probability is proportional to the product of the prior probability and the likelihood function.
We can also extend the definitions of mean and variance to multivariate random variables. For the joint distribution fX,Y(x, y) of X and Y, the mean μX,Y = [μX, μY]T is the vector of the means μX and μY of the random variables X and Y. In the multivariate case, however, the variances of the random variables do not fully describe the variability of the joint random variable. Indeed, the variability of the joint random variable also depends on how the two variables are related. We define then the covariance σX,Y of X and Y as:
The covariance is a measure of the linear dependence between two random variables. The covariance of a random variable with itself is equal to the variance of the variable. Therefore,
(1.27)
where the diagonal of the matrix includes the variances of the random variables, and the elements outside the diagonal represent the covariances. The covariance matrix is symmetric by definition, because σX,Y = σY,X based on the commutative property of the multiplication under the integral in Eq. (1.26). The covariance matrix of a multivariate probability distribution is always positive semi‐definite; and it is positive definite unless one variable is a linear transformation of another variable.
We then introduce the linear correlation coefficient ρX,Y of two random variables X and Y, which is defined as the covariance normalized by the product of the standard deviations of the two random variables:
(1.28)
The correlation coefficient is by definition bounded between −1 and 1 (i.e. −1 ≤ ρX,Y ≤ 1), dimensionless, and easy to interpret. Indeed, a correlation coefficient ρX,Y = 0 means that X and Y are linearly uncorrelated, whereas a correlation coefficient |ρX,Y| = 1 means that Y is a linear function of X. Figure 1.6 shows four examples of two random variables X and Y with different correlation coefficients. When the correlation coefficient is ρX,Y = 0.9, the samples of the two random variables form an elongated cloud of points aligned along a straight line, whereas, when the correlation coefficient is ρX,Y ≈ 0, the samples of the two random variables form a homogeneous cloud of points with no preferential alignment. A positive correlation coefficient means that if the random variable X increases, then the random variable Y increases as well, whereas a negative correlation coefficient means that if the random variable X increases, then the random variable Y decreases. For this reason, when the correlation coefficient is ρX,Y = −0.6, the cloud of samples of the two random variables approximately follows a straight line with negative slope.
If two random variables are independent, i.e. fX,Y(x, y) = fX(x)fY(y), then X and Y are uncorrelated. However, the opposite is not necessarily true. Indeed, the correlation coefficient is a measure of linear correlation; therefore, if two random variables are uncorrelated, then there is no linear relation between the two properties, but it does not necessarily mean that the two variables are independent. For example, if Y = X2, and X takes positive and negative values, then the correlation coefficient is close to 0, but yet Y depends deterministically on X through the quadratic relation (Figure 1.6), and the two variables are not independent.
Figure 1.6 Examples of different correlations of the joint distribution of two random variables X and Y. The correlation coefficient ρX,Y is 0.9 and −0.6 in the top plots and approximately 0 in the bottom plots.
1.4 Probability Distributions
Different probability mass and density functions can be used for discrete and continuous random variables, respectively. For parametric distributions, the function is completely defined by a limited number of parameters (e.g. mean and variance). In this section, we review the most common probability mass and density functions. Probability mass functions are commonly used in geoscience problems for discrete random variables such as facies or rock types, whereas PDFs are used for continuous properties such as porosity, fluid saturations, density, P‐wave and S‐wave velocity. Some applications in earth sciences include mixed discrete–continuous problems with both discrete and continuous random variables.
1.4.1 Bernoulli Distribution
The simplest probability distribution is the Bernoulli distribution and it is associated with a single experiment with only two possible outcomes. An example of this type of experiment is the toss of a coin. Let X be a random variable representing the experiment, then X is a random variable that takes only two outcomes, 0 and 1, where X = 1 means that a favorable event is observed,