φ(xn), where λi ≥ 0 and ∑λi = 1. Thus, if λ1 = p(x1), we satisfy the relations for line interpolation as well as discrete probability distributions, so can rewrite in terms of the Expectation definition:
Since φ(x) = −log(x) is a convex function:
Variance:
Chebyshev's Inequality:
For k > 0, P(|X − E(X)| > k) ≤ Var(X)/k2
Proof:
2.5 Statistics, Conditional Probability, and Bayes' Rule
So far we have counts and probabilities, but what of the probability of X when you know Y has occurred (where X is dependent on Y)? How to account for a greater state of knowledge? It turns out the answer to this was not put on a formal mathematical footing until half way thru the twentieth century, with the Cox derivation [101] .
2.5.1 The Calculus of Conditional Probabilities: The Cox Derivation
The rules of probability, including those describing conditional probabilities, can be obtained using an elegant derivation by Cox [101] . The Cox derivation uses the rules of logic (Boolean algebra) and two simple assumptions. The first assumption is in terms of “b|a,” where b|a ≡ “likelihood” of proposition b when proposition a is known to be true. (The interpretation of “likelihood” as “probability” will fall out of the derivation.) The first assumption is that likelihood c‐and‐b|a is determined by a function of the likelihoods b|a and c|b‐and‐a:
(Assumption 1) c‐and‐b|a = F(c|b‐and‐a, b|a),
for some function F. Consistency with the Boolean algebra then restricts F such that (Assumption 1) reduces to:
where f is a function of one variable and C is a constant. For the trivial choice of function and constant there is:
which is the conventional rule for conditional probabilities (and c‐and‐b|a is rewritten as p(c,b|a), etc.). The second assumption relates the likelihoods of propositions b and ~b when the proposition a is known to be true:
(Assumption 2) ~b|a = S(b|a),
for some function S. Consistency with the Boolean algebra of propositions then forces two relations on S:
which together can be solved to give:
where m is an arbitrary constant. For m = 1 we obtain the relation p(b|a) + p(~b|a) = 1, the ordinary rule for probabilities. In general, the conventions for Assumption 1 can be matched to those on Assumption 2, such that the likelihood relations reduce to the conventional relations on probabilities. Note: conditional probability relationships can be grouped:
to obtain the classic Bayes Theorem.
2.5.2 Bayes' Rule
The derivation of Bayes’ rule is obtained from the property of conditional probability: