Группа авторов

Computational Statistics in Data Science


Скачать книгу

alt="r Subscript p"/> of branch p on a phylogeny as the product of a global treewise mean parameter mu and a branch‐specific random effect epsilon Subscript p. They model the random‐effect epsilon Subscript ps as independent and identically distributed from a lognormal distribution such that epsilon Subscript p has mean 1 and variance psi squared under a hierarchical model where psi is the scale parameter. To accommodate the difference in scales of the variability in the parameter space for the HMC sampler, the authors adopt preconditioning with adaptive mass matrix informed by the diagonal entries of the Hessian matrix. More precisely, the nonzero diagonal elements of the mass matrix truncate the values from the first s HMC iterations of upper H Subscript p p Superscript left-parenthesis s right-parenthesis Baseline equals StartFraction 1 Over left floor s slash k right floor EndFraction sigma-summation Underscript s colon s slash k element-of double-struck upper Z Superscript plus Baseline Endscripts left-bracket period vertical-bar minus minus times times partial-differential 2 of partial-differential 2 theta p of loglog of pi pi left-parenthesis right-parenthesis theta Subscript bold-italic theta equals bold-italic theta Sub Superscript left-parenthesis s right-parenthesis Subscript Baseline right-bracket almost-equals double-struck upper E Subscript normal pi left-parenthesis theta right-parenthesis Baseline left-bracket minus StartFraction partial-differential squared Over partial-differential squared theta Subscript i Baseline EndFraction log normal pi left-parenthesis bold-italic theta right-parenthesis right-bracket so that the matrix remains positive‐definite and numerically stable. They estimate the treewise (fixed‐effect) mean rate mu with posterior mean 4.75 (95 percent-sign Bayesian credible interval: 4.05 comma 5.33) times 1 0 Superscript negative 4 substitutions per site per year with rate variability characterized by scale parameter with posterior mean psi equals 1.26 left-bracket 1.06 comma 1.45 right-bracket for serotype 3 of Dengue virus with a sample size of 352 [69]. Figure 1 illustrates the estimated maximum clade credible evolutionary tree of the Dengue virus dataset.

stat08324fgz001

      Section 4.1 presents Core Challenge 4, achieving “algo‐ware” (a neologism suggesting an equal emphasis on the statistical algorithm and its implementation) that is sufficiently efficient, broad, and user‐friendly to empower everyday statisticians and data scientists. Core Challenge 5 (Section 4.2) explores the mapping of these algorithms to computational hardware for optimal performance. Hardware‐optimized implementations often exploit model‐specific structures, but good, general‐purpose software should also optimize common routines.

      4.1 Fast, Flexible, and Friendly Statistical Algo‐Ware

      To accommodate the greatest range of models while remaining simple enough to encourage easy implementation, inference methods should rely solely on the quantities that can be computed algorithmically for any given model. The log‐likelihood (or log‐density in the Bayesian setting) is one such quantity, and one can employ the computational graph framework [77, 78] to evaluate conditional log‐likelihoods for any subset of model parameters as well as their gradients via backpropagation [79]. Beyond being efficient in terms of the first three Core Challenges, an algorithm should demonstrate robust performance on a reasonably wide range of problems without extensive tuning if it is to lend itself to successful software deployment.

      HMC (Section