Computational Statistics in Data Science. Группа авторов. Читать онлайн. Hotlib. HOTLIB.NET

Computational Statistics in Data Science

ellipsis comma x Subscript upper N Baseline right-parenthesis"/>. We specify our model for the data with a likelihood function normal pi left-parenthesis bold x vertical-bar bold-italic theta right-parenthesis equals product Underscript n equals 1 Overscript upper N Endscripts normal pi left-parenthesis x Subscript n Baseline vertical-bar bold-italic theta right-parenthesis and use a prior distribution with density function normal pi left-parenthesis bold-italic theta right-parenthesis to characterize our belief about the value of the upper P ‐dimensional parameter vector a priori. The target of Bayesian inference is the posterior distribution of conditioned on bold x

(1)

The denominator's multidimensional integral quickly becomes impractical as upper P grows large, so we choose to use the MetropolisHastings (M–H) algorithm to generate a Markov chain with stationary distribution normal pi left-parenthesis bold-italic theta vertical-bar bold x right-parenthesis [19, 20]. We begin at an arbitrary position bold-italic theta Superscript left-parenthesis 0 right-parenthesis and, for each iteration s equals 0 comma ellipsis comma upper S , randomly generate the proposal state from the transition distribution with density q left-parenthesis bold-italic theta Superscript asterisk Baseline vertical-bar bold-italic theta Superscript left-parenthesis s right-parenthesis Baseline right-parenthesis . We then accept proposal state with probability

(2)

The ratio on the right no longer depends on the denominator in Equation (1), but one must still compute the likelihood and its upper N terms normal pi left-parenthesis x Subscript n Baseline vertical-bar bold-italic theta Superscript asterisk Baseline right-parenthesis .

It is for this reason that likelihood evaluations are often the computational bottleneck for Bayesian inference. In the best case, these evaluations are script í’ª left-parenthesis upper N right-parenthesis , but there are many situations in which they scale [21, 22] or worse. Indeed, when upper P is large, it is often advantageous to use more advanced MCMC algorithms that use the gradient of the log‐posterior to generate better proposals. In this situation, the log‐likelihood gradient may also become a computational bottleneck [21].

2.2 Big P

One of the simplest models for big upper P problems is ridge regression [23], but computing can become expensive even in this classical setting. Ridge regression estimates the coefficient bold-italic theta by minimizing the distance between the observed and predicted values bold y and bold upper X bold-italic theta along with a weighted square norm of :

StartLayout 1st Row 1st Column ModifyingAbove bold-italic theta With Ì‚ equals argmin left-brace double-vertical-bar bold y minus bold upper X bold-italic theta double-vertical-bar squared plus double-vertical-bar bold upper Phi Superscript 1 slash 2 Baseline bold-italic theta double-vertical-bar squared right-brace equals left-parenthesis bold upper X Superscript intercalate Baseline bold upper X plus bold upper Phi right-parenthesis Superscript negative 1 Baseline bold upper X Superscript intercalate Baseline bold y 2nd Column Blank EndLayout

For illustrative purposes, we consider the following direct method for computing ModifyingAbove bold-italic theta With Ì‚ .⁴ We can first multiply the upper N times upper P design matrix bold upper X by its transpose at the cost of script í’ª left-parenthesis upper N squared upper P right-parenthesis and subsequently invert the upper P times upper P matrix left-parenthesis bold upper X Superscript intercalate Baseline bold upper X plus bold upper Phi right-parenthesis at the cost of . The total complexity shows that (i) a large number

Скачать книгу