Nola du Toit
NORC at the University of Chicago
Chicago, IL
USA
Dootika Vats
Indian Institute of Technology Kanpur
Kanpur
India
Matti Vihola
University of Jyväskylä
Jyväskylä
Finland
Justin Wang
University of California at Davis
Davis, CA
USA
Will Wei Sun
Purdue University
West Lafayette, IN
USA
Leland Wilkinson
H2O.ai, Mountain View
California
USA
and
University of Illinois at Chicago
Chicago, IL
USA
Joong‐Ho Won
Seoul National University
Seoul
South Korea
Yichao Wu
University of Illinois at Chicago
Chicago, IL
USA
Min‐ge Xie
Rutgers University
Piscataway, NJ
USA
Ming Yan
Michigan State University
East Lansing, MI
USA
Yuling Yao
Columbia University
New York, NY
USA
and
Center for Computational Mathematics
Flatiron Institute
New York, NY
USA
Chun Yip Yau
Chinese University of Hong Kong
Shatin
Hong Kong
Hao H. Zhang
University of Arizona
Tucson, AZ
USA
Hua Zhou
University of California
Los Angeles, CA
USA
Preface
Computational statistics is a core area of modern statistical science and its connections to data science represent an ever‐growing area of study. One of its important features is that the underlying technology changes quite rapidly, riding on the back of advances in computer hardware and statistical software. In this compendium we present a series of expositions that explore the intermediate and advanced concepts, theories, techniques, and practices that act to expand this rapidly evolving field. We hope that scholars and investigators will use the presentations to inform themselves on how modern computational and statistical technologies are applied, and also to build springboards that can develop their further research. Readers will require knowledge of fundamental statistical methods and, depending on the topic of interest they peruse, any advanced statistical aspects necessary to understand and conduct the technical computing procedures.
The presentation begins with a thoughtful introduction on how we should view Computational Statistics & Data Science in the 21st Century (Holbrook, et al.), followed by a careful tour of contemporary Statistical Software (Schissler, et al.). Topics that follow address a variety of issues, collected into broad topic areas such as Simulation‐based Methods, Statistical Learning, Quantitative Visualization, High‐performance Computing, High‐dimensional Data Analysis, and Numerical Approximations & Optimization.
Internet access to all of the articles presented here is available via the online collection Wiley StatsRef: Statistics Reference Online (Davidian, et al., 2014–2021); see https://onlinelibrary.wiley.com/doi/book/10.1002/9781118445112.
From Deep Learning (Li, et al.) to Asynchronous Parallel Computing (Yan), this collection provides a glimpse into how computational statistics may progress in this age of big data and transdisciplinary data science. It is our fervent hope that readers will benefit from it.
We wish to thank the fine efforts of the Wiley editorial staff, including Kimberly Monroe‐Hill, Paul Sayer, Michael New, Vignesh Lakshmikanthan, Aruna Pragasam, Viktoria Hartl‐Vida, Alison Oliver, and Layla Harden in helping bring this project to fruition.
Tucson, ArizonaSan Diego, California Tucson, ArizonaDavis, California | Walter W. Piegorsch Richard A. Levine Hao Helen Zhang Thomas C. M. Lee |
Reference
1 Davidian, M., Kenett, R.S., Longford, N.T., Molenberghs, G., Piegorsch, W.W., and Ruggeri, F., eds. (2014–2021). Wiley StatsRef: Statistics Reference Online. Chichester: John Wiley & Sons. doi:10.1002/9781118445112.
1 Computational Statistics and Data Science in the Twenty‐First Century
Andrew J. Holbrook1, Akihiko Nishimura2, Xiang Ji3, and Marc A. Suchard1
1University of California, Los Angeles, CA, USA
2Johns Hopkins University, Baltimore, MD, USA
3Tulane University, New Orleans, LA, USA
1 Introduction
We are in the midst of the data science revolution. In October 2012, the Harvard Business Review famously declared data scientist the sexiest job of the twenty‐first century [1]. By September 2019, Google searches for the term “data science” had multiplied over sevenfold [2], one multiplicative increase for each intervening year. In the United States between the years 2000 and 2018, the number of bachelor's degrees awarded in either statistics or biostatistics increased over 10‐fold (382–3964), and the number of doctoral degrees almost tripled (249–688) [3]. In 2020, seemingly every major university has established or is establishing its own data science institute, center, or initiative.
Data science [4, 5] combines multiple preexisting disciplines (e.g., statistics, machine learning, and computer science) with a redirected focus on creating,