Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics Using Python


Скачать книгу

scientific claims presumably supported by bloated statistical analyses. Just look at the methodological debates that surrounded COVID-19, which is on an object that is relatively “easy” philosophically! Step away from concrete science, throw in advanced statistical technology and complexity, and you enter a world where establishing evidence is philosophical quicksand. Many students who use statistical methods fall into these pits without even knowing it and it is the instructor’s responsibility to keep them grounded in what the statistical method can vs. cannot do. I have told students countless times, “No, the statistical method cannot tell you that; it can only tell you this.”

      Hence, for the student of empirical sciences, they need to be acutely aware and appreciative of the deeper issues of conducting their own science. This implies a heavier emphasis on not how to conduct a billion different statistical analyses, but on understanding the issues with conducting the “basic” analyses they are performing. It is a matter of fact that many students who fill their theses or dissertations with applied statistics may nonetheless fail to appreciate that very little of scientific usefulness has been achieved. What has too often been achieved is a blatant abuse of statistics masquerading as scientific advancement. The student “bootstrapped standard errors” (Wow! Impressive!), but in the midst of a dissertation that is scientifically unsound or at a minimum very weak on a methodological level.

      Returning to our mediation example, if the context of the research problem lends itself to a physical or substantive definition of mediation or any other physical process, such that there is good reason to believe Z is truly, substantively, “mediating,” then the statistical model can be used as establishing support for this already-presumed relation, in the same way a statistical model can be used to quantify the generational transmission of physical qualities from parent to child in regression. The process itself, however, is not due to the fitting of a statistical model. Never in the history of science or statistics has a statistical model ever generated a process. It merely, and potentially, has only described one. Many students, however, excited to have bootstrapped those standard errors in their model and all the rest of it, are apt to draw substantive conclusions based on a statistical model that simply do not hold water. In such cases, one is better off not running a statistical model at all rather than using it to draw inane philosophically egregious conclusions that can usually be easily corrected in any introduction to a philosophy of science or research methodology course. Abusing and overusing statistics does little to advance science. It simply provides a cloak of complexity.

       Statistical knowledge is not equivalent to software knowledge. One can become a proficient expert at Python, for instance, yet still not possess the scientific expertise or experience to successfully interpret output from data analyses. The difficult part is not in generating analyses (that can always be looked up). The most important thing is to interpret analyses correctly in relation to the empirical objects under investigation, and in most cases, this involves recognizing the limitations of what can vs. cannot be concluded from the data analysis.

      Mathematical vs. “Conceptual” Understanding

      One important aspect of learning and understanding any craft is to know where and why making distinctions is important, and on the opposite end of the spectrum, where divisions simply blur what is really there. One area where this is especially true is in learning, or at least “using,” a technical discipline such as mathematics and statistics to better understand another subject. Many instructors of applied statistics strive to teach statistics at a “conceptual” level, which, to them at least, means making the discipline less “mathematical.” This is done presumably to attract students who may otherwise be fearful of mathematics with all of its formulas and symbolism. However, this distinction, I argue, does more harm than good, and completely misses the point. The truth of the matter is that mathematics are concepts. Statistics are likewise concepts. Attempting to draw a distinction between two things that are the same does little good and only provides more confusion for the student.