Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics


Скачать книгу

regression. In all such cases, we are seeking to detect a deviation from the null hypothesis.

      2 The significance level, or type I error rate (α) at which you set your test. All else equal, a more liberal setting such as 0.05 or 0.10 affords more statistical power than a more conservative setting such as 0.01 or 0.001, for instance. It is easier to detect a false null if you allow yourself more of a risk of committing a type I error. Since we usually want to minimize type I error, we typically want to regard α as fixed at a nominal level (e.g., 0.05 or 0.01) and consider it not amenable to adjustment for the purpose of increasing power. Hence, when it comes to boosting power, researchers usually do not want to “mess with” the type I error rate.

      3 Population variability, σ2, often unknown but estimated by s2. All else equal, the greater the variance of objects studied in the population, the less sensitive the statistical test, and the less power you will have. Why is this so? As an analogy, consider a rock thrown into the water. The rock will make a definitive particular “splash” in that it will displace a certain amount of water when it hits the surface. This can be considered to be the “effect size” of the splash. If the water is noisy with wind and waves (i.e., high population variability), it will be difficult to detect the splash. If, on the other hand, the water is calm and serene (i.e., low population variability), you will more easily detect the splash. Either way, the rock made a particular splash of a given size. The magnitude of the splash is the same regardless of whether the waters are calm or turbulent. Whether we can detect the splash or not is in part a function of the variance in the population.

      4 Applying this concept to research settings, if you are sampling from “noisy” populations, it is harder to see the effect of your independent variable than if you are sampling from less noisy and thus, less variable, populations. This is why research using lab rats or other equally controllable objects can usually detect effects with relatively few animals in a sample, whereas research studying humans on variables such as intelligence, anxiety, attitudes, etc., usually requires many more subjects in order to detect effects. A good way to boost power is to study populations that have relatively low variability before your treatment is administered. If your treatment works, you will be able to detect its efficacy with fewer subjects than if dealing with a highly variable population. Another approach is to covary out one or two factors that are thought to be related to the dependent variable through a technique such as the analysis of covariance (Keppel and Wickens, 2004), discussed and demonstrated later in the book.

      5 Sample size, n. All else equal, the greater the sample size, the greater the statistical power. Boosting sample size is a common strategy for increasing power. Indeed, as will be discussed at the conclusion of this chapter, for any significance test in which there is at least some effect (i.e., some distance between the null and alternative), statistical significance is assured for a large‐enough sample size. Obtaining large samples is a good thing (since after all, the most ideal goal would be to have the actual population), but as sample size increases, the p‐value becomes an increasingly poor indicator or measure of experimental effect. Effect sizes should always be reported alongside any significance test.

      2.21.1 Visualizing Power

      Statistical power matters so long as we have the inferential goal of rejecting null hypotheses. A study that is underpowered risks not being able to reject null hypotheses even if such null hypotheses are in reality false. A failure to reject a null hypothesis under the condition of minimal power could either mean a lack of inferential support for the obtained finding, or it could simply suggest an underpowered (and consequently poorly designed) experiment or study. Ensuring adequate statistical power before one engages in a research study or experiment is mandatory (Cohen, 1988).

      To demonstrate the estimation of power using software, we first use pwr.r.test (Champely, 2014) in R to estimate required sample size for a Pearson r correlation coefficient. As an example, we estimate required sample size for a population correlation coefficient of ρ = 0.10 at a significance level set to 0.05, with desired power equal to 0.90. Note that in the code that follows, we purposely leave n empty so R can estimate this figure for us:

      Source: Bollen (1989). Reproduced with permission from John Wiley & Sons, Inc.

      We see that to detect a correlation coefficient of 0.10 at a desired level of power equal to 0.9, a sample size of 1046 is required. We could round up to 1047 for a slightly more conservative estimate. It is a more conservative estimate because 1047 is slightly more “generous” of a sample than R is reporting is necessary (1046). Now, in this case, the difference is extremely slight, but in general, when you provide your analysis with more subjects than what may be necessary for a given level of power, you are guarding against the possibility of obtaining smaller effects than what you believe are “out there” in your population. If in doubt, larger samples are always preferable to smaller ones, and thus rounding “up” on sample size requirements is usually a good idea.

Graph depicts G asterisk Power output for estimating required sample size for r = 0.10. Graph depicts power curves generated by G asterisk Power for detecting correlation coefficients of row = 0.10 to 0.50.