Historically, Bayesians have paid much less attention to convexity than have optimization researchers. This is most likely because the basic theory [13] of MCMC does not require such restrictions: even if a target distribution has one million modes, the well‐constructed Markov chain explores them all in the limit. Despite these theoretical guarantees, a small literature has developed to tackle multimodal Bayesian inference [39–42] because multimodal target distributions do present a challenge in practice. In analogy with Equation (3), Bayesians seek to induce sparsity by specifiying priors such as the spike‐and‐slab [43–45], for example,
As with the best subset selection objective function, the spike‐and‐slab target distribution becomes heavily multimodal as
In the following section, we present an alternative Bayesian sparse regression approach that mitigates the combinatorial problem along with a state‐of‐the‐art computational technique that scales well both in
3 Model‐Specific Advances
These challenges will remain throughout the twenty‐first century, but it is possible to make significant advances for specific statistical tasks or classes of models. Section 3.1 considers Bayesian sparse regression based on continuous shrinkage priors, designed to alleviate the heavy multimodality (big
And because of the rise of data science, there are increasing opportunities for computational statistics to grow by enabling and extending statistical inference for scientific applications previously outside of mainstream statistics. Here, the science may dictate the development of structured models with complexity possibly growing in
3.1 Bayesian Sparse Regression in the Age of Big N and Big P
With the goal of identifying a small subset of relevant features among a large number of potential candidates, sparse regression techniques have long featured in a range of statistical and data science applications [46]. Traditionally, such techniques were commonly applied in the “
Due to a growing number of initiatives for large‐scale data collections and new types of scientific inquiries made possible by emerging technologies, however, increasingly common are datasets that are “big
3.1.1 Continuous shrinkage: alleviating big M
Bayesian sparse regression, despite its desirable theoretical properties and flexibility to serve as a building block for richer statistical models, has always been relatively computationally intensive even before the advent of “big
The idea is that the global scale parameter