Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics Using Python


Скачать книгу

Computing a t-test, for instance, is not mathematics. It is arithmetic. Understanding what occurs in the t-test as the mean difference in the numerator goes toward zero (for example) is not “conceptual understanding.” Rather, it is mathematics, and the fact that the concepts of mathematics can be unpacked into a more verbal or descriptive discussion only serves to delineate the concept that already exists underneath the description. Many instructors of applied statistics are not aware of this and continually foster the idea to students that mathematics is somehow separate from the conceptual development they are trying to impart onto their students. Instructors who teach statistics as a series of recipesand formulas without any conceptual development at all do a serious (almost “malpractice”) disservice to their students. Once students begin to appreciate that mathematics and statistics is, in a strong sense, a branch of philosophy “rigorized,” replete with premises, justifications, and proofs and other analytical arguments, they begin to see it less as “mathematics” and adopt a deeper understanding of what they are engaging in. The student should always be critical of the a priori associations they have made to any subject or discipline. The student who “dislikes” mathematics is quite arrogant to think they understand the object enough to know they dislike it. It is a form of discrimination. Critical reflection and rebuilding of knowledge (i.e. or at least what one assumes to already be true) is always a productive endeavor. It’s all “concepts,” and mathematics and statistics have done a great job at rigorizing and symbolizing tools for the purpose of communication. Otherwise, “probability,” for instance, remains an elusive concept and the phrase “the result is probably not due to chance” is not measurable. Mathematics and statistics give us a way to measure those ideas, those concepts. As Fisher again once told us, you may not be able to avoid chance and uncertainty, but if you can measure and quantify it, you are on to something. However, measuring uncertainty in a scientific (as opposed to an abstract) context can be exceedingly difficult.

      Advice for Instructors

      The book can be used at either the advanced undergraduate or graduate levels, or for self-study. The book is ideal for a 16-week course, for instance one in a Fall or Spring semester, and may prove especially useful for programs that only have space or desire to feature a single data-analytic course for students. Instructors can use the book as a primary text or as a supplement to a more theoretical book that unpacks the concepts featured in this book. Exercises at the end of each chapter can be assigned weekly and can be discussed in class or reviewed by a teaching assistant in lab. The goal of the exercises should be to get students thinking critically and creatively, not simply getting the “right answer.”

      It is hoped that you enjoy this book as a gentle introduction to the world of applied statistics using Python. Please feel free to contact me at [email protected] or [email protected] should you have any comments or corrections. For data files and errata, please visit www.datapsyc.com.

      Daniel J. Denis

      March, 2021

      CHAPTER OBJECTIVES

       How probability is the basis of statistical and scientific thinking.

       Examples of statistical inference and thinking in the COVID-19 pandemic.

       Overview of how null hypothesis significance testing (NHST) works.

       The relationship between statistical inference and decision-making.

       Error rates in statistical thinking and how to minimize them.

       The difference between a point estimator and an interval estimator.

       The difference between a continuous vs. discrete variable.

       Appreciating a few of the more salient philosophical underpinnings of applied statistics and science.

       Understanding scales of measurement, nominal, ordinal, interval, and ratio.

       Data analysis, data science, and “big data” distinctions.

      The goal of this first chapter is to provide a global overview of the logic behind statistical inference and how it is the basis for analyzing data and addressing scientific problems. Statistical inference, in one form or another, has existed at least going back to the Greeks, even if it was only relatively recently formalized into a complete system. What unifies virtually all of statistical inference is that of probability. Without probability, statistical inference could not exist, and thus much of modern day statistics would not exist either (Stigler, 1986).

      When you think about it for a moment, virtually all things in the world are probabilistic. As a recent example, consider the COVID-19 pandemic of 2020. Since the start of the outbreak, questions involving probability were front and center in virtually all media discussions. That is, the undertones of probability, science, and statistical inference were virtually everywhere where discussions of the pandemic were to be had. Concepts of probability could not be avoided. The following are just a few of the questions asked during the pandemic:

       What is the probability of contracting the virus, and does this probability vary as a function of factors such as pre-existing conditions or age? In this latter case, we might be interested in the conditional probability of contracting COVID-19 given a pre-existing condition or advanced age. For example, if someone suffers from heart disease, is that person at greatest risk of acquiring the infection? That is, what is the probability of COVID-19 infection being conditional on someone already suffering from heart disease or other ailments?