Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics


Скачать книгу

deleted, see Section for a discussion). If you are completely unfamiliar with boxplots, see Denis (2020) for an overview.

      Stem‐and‐leaf plots are also easily produced. These visual displays are kind of “naked histograms,” because they reveal the actual observations in the data while also providing information about their frequency of occurrence. In 1710, John Arbuthnot analyzed data on the ratios of males to female births in London from 1629 to 1710 and in so doing made an argument for these births being a function of a “divine being” (Arbuthnot, 1710; Shoesmith, 1987). One of his variables was the number of male christenings (i.e., baptisms) over the period 1629–1710. We generate a stem‐and‐leaf plot in R of these male christenings using package aplpack (Wolf and Bielefeld, 2014), for which the “leaves” are corresponding hundreds. For example, in the following plot, the first value of 2|8 would appear to represent a value of 2800 but is rounded down from the actual value in the data (which is also the minimum) of 2890. The maximum in the data is actually equal to 8426, but is represented by 8400 (i.e., 8|0012334):

      The workhorse for establishing statistical evidence in the social and natural sciences is the method of null hypothesis significance testing (or, “NHST” for short). However, since its inception with R.A. Fisher in the early 1900s, the significance test has been the topic of much debate, both statistical and philosophical. Throughout much of this book, NHST is regularly used to evaluate null hypotheses in methods such as the analysis of variance, regression, and various multivariate procedures. Indeed, the procedure is universally used in most statistical methods.

      It behooves us then, before embarking on all of these methodologies, to discuss the nature of the null hypothesis significance test, and clearly demonstrate what it actually means, not only in a statistical context but also in how it should be interpreted in a research or substantive context.

      The purpose of this final section of the present chapter is to provide a clear and concise demonstration and summary of the factors that influence the size of a computed p‐value in virtually every statistical significance test. Understanding why statements such as “p < 0.05” can be reflective of even the smallest and trivial of effects is critical for the practitioner or researcher to appreciate if he or she is to assess and appraise statistical evidence in an intelligent and thoughtful manner. It is not an exaggeration to say that if one does not understand the make‐up of a p‐value and the factors that directly influence its size, one cannot properly evaluate statistical evidence, nor should one even make the attempt to do so. Though these arguments are not new and have been put forth by even the very best of methodologists (e.g., see Cohen, 1990; Meehl, 1978) there is evidence to suggest that many practitioners and researchers do not understand the factors that determine the size of a p‐value (Gigerenzer, 2004). To emphasize once again—understanding the determinants of a p‐value and what makes p‐values distinct from effect sizes is not simply “fashionable.” Rather, it is absolutely mandatory for any attempt to properly evaluate statistical evidence in a research report. Does the paper you're reading provide evidence of a successful treatment for cancer? If you do not understand the distinctions between p‐values and effect sizes, you will be unable to properly assess the evidence. It is that important. As we will see, stating a result as “statistically significant” does not in itself tell you whether the treatment works or does not work, and in some cases, tells you very little at all from a scientific vantage point.

      2.28.1 Null Hypothesis Significance Testing (NHST): A Legacy of Criticism

      Criticisms targeted against null hypothesis significance testing have inundated the literature since at least the time Berkson in 1938 brought to light how statistical significance can be easily achieved by simple manipulations of sample size:

      I believe that an observant statistician who has had any considerable experience with applying the chi‐square test repeatedly will agree with my statement that, as a matter of observation, when the numbers in the data are quite large, the P's tend to come out small. (p. 526)

      Since Berkson, the very best and renown of methodologists have remarked that the significance test is subject to gross misunderstanding and misinterpretation (e.g., see Bakan, 1966; Carver, 1993; Cohen, 1990; Estes, 1997; Loftus, 1991; Meehl, 1978; Oakes, 1986; Shrout, 1997; Wilson, Miller, and Lower, 1967). And though it can be difficult to assess or evaluate whether the situation has improved, there is evidence to suggest that it has not. Few describe the problem better than Gigerenzer in his article Mindless statistics (Gigerenzer, 2004), in which he discusses both the roots and truths of hypothesis testing, as well as how its “statistical rituals” and practices have become far more of a sociological phenomenon rather than anything related to good science and statistics.

      Recall the familiar one‐sample z‐test for a mean discussed earlier:

equation

      where the purpose of the test was to compare an obtained sample mean images to a population mean μ0 under the null hypothesis that μ = μ0. Sigma, σ, recall is the standard deviation of the population from which the sample was presumably drawn. Recall that in practice, this value is rarely if ever known for certain, which is why in most cases an estimate of it is obtained in the form of a sample standard deviation s. What determines the size of zM, and therefore, the smallness of p? There are three inputs that determine the size of p, which we have already featured in our earlier discussion of statistical power. These three factors are images, σ and n. We consider each of these once more, then provide simple arithmetic demonstrations to emphasize how changing any one of these necessarily results in an arithmetical change in zM, and consequently, a change in the observed p‐value.

      As a first case, consider the distance images. Given constant values of σand n, the greater the distance between imagesand μ0, the larger zMwill be. That is, as the numerator images grows larger, the resulting zM also gets larger in size, which as a consequence, decreases p in size. As a simple example, assume for a given research problem that σ is equal to 20 and n is equal to 100. This means that the standard error is equal to 20/