Informatics and Machine Learning. Stephen Winters-Hilt. Читать онлайн. Hotlib. HOTLIB.NET

Informatics and Machine Learning

φ(x_n), where λ_i ≥ 0 and ∑λ_i = 1. Thus, if λ₁ = p(x₁), we satisfy the relations for line interpolation as well as discrete probability distributions, so can rewrite in terms of the Expectation definition:

phi left-parenthesis upper E left-parenthesis upper X right-parenthesis right-parenthesis less-than-or-equal-to upper E left-parenthesis phi left-parenthesis upper X right-parenthesis right-parenthesis

Since φ(x) = −log(x) is a convex function:

log left-parenthesis upper E left-parenthesis upper X right-parenthesis right-parenthesis greater-than-or-equal-to upper E left-parenthesis log left-parenthesis upper X right-parenthesis right-parenthesis equals minus upper H left-parenthesis upper X right-parenthesis

Variance:

upper V a r left-parenthesis upper X right-parenthesis identical-to upper E left-parenthesis left-bracket upper X minus upper E left-parenthesis upper X right-parenthesis right-bracket squared right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper L Endscripts left-parenthesis x Subscript i Baseline minus upper E left-parenthesis upper X right-parenthesis right-parenthesis squared p left-parenthesis x Subscript i Baseline right-parenthesis equals upper E left-parenthesis upper X squared right-parenthesis minus left-parenthesis upper E left-parenthesis upper X right-parenthesis right-parenthesis squared

Chebyshev's Inequality:

For k > 0, P(|X − E(X)| > k) ≤ Var(X)/k²

Proof:

2.5 Statistics, Conditional Probability, and Bayes' Rule

So far we have counts and probabilities, but what of the probability of X when you know Y has occurred (where X is dependent on Y)? How to account for a greater state of knowledge? It turns out the answer to this was not put on a formal mathematical footing until half way thru the twentieth century, with the Cox derivation [101] .

2.5.1 The Calculus of Conditional Probabilities: The Cox Derivation

The rules of probability, including those describing conditional probabilities, can be obtained using an elegant derivation by Cox [101] . The Cox derivation uses the rules of logic (Boolean algebra) and two simple assumptions. The first assumption is in terms of “b|a,” where b|a ≡ “likelihood” of proposition b when proposition a is known to be true. (The interpretation of “likelihood” as “probability” will fall out of the derivation.) The first assumption is that likelihood c‐and‐b|a is determined by a function of the likelihoods b|a and c|b‐and‐a:

(Assumption 1) c‐and‐b|a = F(c|b‐and‐a, b|a),

for some function F. Consistency with the Boolean algebra then restricts F such that (Assumption 1) reduces to:

italic upper C f left-parenthesis c hyphen and negative b bar a right-parenthesis equals f left-parenthesis c bar b hyphen and negative a right-parenthesis f left-parenthesis b bar a right-parenthesis

where f is a function of one variable and C is a constant. For the trivial choice of function and constant there is:

p left-parenthesis c comma b bar a right-parenthesis equals p left-parenthesis c bar b comma a right-parenthesis p left-parenthesis b bar a right-parenthesis

which is the conventional rule for conditional probabilities (and c‐and‐b|a is rewritten as p(c,b|a), etc.). The second assumption relates the likelihoods of propositions b and ~b when the proposition a is known to be true:

(Assumption 2) ~b|a = S(b|a),

for some function S. Consistency with the Boolean algebra of propositions then forces two relations on S:

upper S left-bracket upper S left-parenthesis x right-parenthesis right-bracket equals x and italic x upper S left-bracket upper S left-parenthesis y right-parenthesis slash x right-bracket equals italic y upper S left-bracket upper S left-parenthesis x right-parenthesis slash y right-bracket

which together can be solved to give:

upper S left-parenthesis p right-parenthesis equals left-parenthesis 1 minus p Superscript m Baseline right-parenthesis Superscript 1 slash m

where m is an arbitrary constant. For m = 1 we obtain the relation p(b|a) + p(~b|a) = 1, the ordinary rule for probabilities. In general, the conventions for Assumption 1 can be matched to those on Assumption 2, such that the likelihood relations reduce to the conventional relations on probabilities. Note: conditional probability relationships can be grouped:

p left-parenthesis b bar a right-parenthesis equals p left-parenthesis a bar b right-parenthesis p left-parenthesis b right-parenthesis slash p left-parenthesis a right-parenthesis

to obtain the classic Bayes Theorem.

2.5.2 Bayes' Rule

The derivation of Bayes’ rule is obtained from the property of conditional probability:

p left-parenthesis x Subscript i Baseline comma y Subscript j Baseline right-parenthesis equals p left-parenthesis x Subscript i Baseline bar y Subscript j Baseline right-parenthesis p left-parenthesis y Subscript j Baseline right-parenthesis equals p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis

p left-parenthesis x Subscript i Baseline bar y Subscript j Baseline right-parenthesis equals p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline </p>
</div><hr>
<div class=

Скачать книгу