Computational Statistics in Data Science. Группа авторов. Читать онлайн. Hotlib. HOTLIB.NET

Computational Statistics in Data Science

p"/>s toward zero, while the local scale lamda Subscript p s, with its heavy‐tailed prior pi Subscript l o c a l Baseline left-parenthesis dot right-parenthesis , allow a small number of tau lamda Subscript p and hence theta Subscript p s to be estimated away from zero. While motivated by two different conceptual frameworks, the spike‐and‐slab can be viewed as a subset of global–local priors in which pi Subscript l o c a l Baseline left-parenthesis dot right-parenthesis is chosen as a mixture of delta masses placed at lamda Subscript p Baseline equals 0 and . Continuous shrinkage mitigates the multimodality of spike‐and‐slab by smoothly bridging small and large values of .

On the other hand, the use of continuous shrinkage priors does not address the increasing computational burden from growing upper N and upper P in modern applications. Sparse regression posteriors under global–local priors are amenable to an effective Gibbs sampler, a popular class of MCMC we describe further in Section 4.1. Under the linear and logistic models, the computational bottleneck of this Gibbs sampler stems from the need for repeated updates of bold-italic theta from its conditional distribution

(4)

where bold upper Omega is an additional parameter of diagonal matrix and bold upper Lamda equals diag left-parenthesis bold-italic lamda right-parenthesis .⁵ Sampling from this high‐dimensional Gaussian distribution requires script í’ª left-parenthesis upper N upper P squared plus upper P cubed right-parenthesis operations with the standard approach [58]: for computing the term bold upper X Superscript intercalate Baseline bold upper Omega bold upper X and for Cholesky factorization of bold upper Phi . While an alternative approach by Bhattacharya et al. [48] provides the complexity of script í’ª left-parenthesis upper N squared upper P plus upper N cubed right-parenthesis , the computational cost remains problematic in the big upper N and big upper P regime at script í’ª left-parenthesis min left-brace right-brace comma times times upper N 2 upper P comma times times NP 2 right-parenthesis after choosing the faster of the two.

3.1.2 Conjugate gradient sampler for structured high‐dimensional Gaussians

The conjugate gradient (CG) sampler of Nishimura and Suchard [57] combined with their prior‐preconditioning technique overcomes this seemingly inevitable script í’ª left-parenthesis min left-brace right-brace comma times times upper N 2 upper P comma times times NP 2 right-parenthesis growth of the computational cost. Their algorithm is based on a novel application of the CG method [59, 60], which belongs to a family of iterative methods in numerical linear algebra. Despite its first appearance in 1952, CG received little attention for the next few decades, only making its way into major software packages such as MATLAB in the 1990s [61]. With its ability to solve a large and structured linear system bold upper Phi bold-italic theta equals bold-italic b via a small number of matrix–vector multiplications bold-italic v right-arrow bold upper Phi bold-italic v without ever explicitly inverting , however, CG has since emerged as an essential and prototypical algorithm for modern scientific computing [62, 63].

Despite its earlier rise to prominence in other fields, CG has not found practical applications in Bayesian computation until rather recently [57, 64]. We can offer at least two explanations for this. First, being an algorithm for solving a deterministic linear system, it is not obvious how CG would be relevant to Monte Carlo simulation, such as sampling from Normal Subscript upper P Baseline left-parenthesis bold-italic mu comma bold upper Phi Superscript negative 1 Baseline right-parenthesis ; ostensively, such a task requires computing a “square root” bold-italic upper L of the precision matrix so that upper V a r left-parenthesis bold-italic upper L Superscript negative 1 Baseline bold-italic z right-parenthesis equals bold-italic upper L Superscript negative 1 Baseline bold-italic upper L Superscript minus intercalate Baseline equals bold upper Phi Superscript negative 1 for bold-italic z tilde Normal Subscript upper P Baseline left-parenthesis bold 0 comma bold-italic upper I Subscript upper P Baseline right-parenthesis . Secondly, unlike direct linear algebra methods, iterative methods such as CG have a variable computational cost that depends critically on the user's choice of a preconditioner and thus cannot be used as a “black‐box” algorithm.⁶ In particular, this novel application of CG to Bayesian computation is a reminder that other powerful ideas in other computationally intensive fields may remain untapped by the statistical computing community; knowledge transfers will likely be facilitated by having more researchers working at intersections of different fields.

Nishimura and Suchard [57] turns CG into a viable algorithm for Bayesian sparse regression problems

Скачать книгу