Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics


Скачать книгу

= F (correction = false) negated what is known as Yates' correction for continuity, which involves subtracting 0.5 from positive differences in OE and adding 0.5 to negative differences in OE in an attempt to better make the chi‐square distribution approximate that of a multinomial distribution (i.e., in a crude sense, to help make discrete probabilities more continuous). To adjust for Yates, we can either specify correct = T or simply chisq.test(diag.table) , which will incorporate the correction. With the correction implemented, our p‐value increases from 0.003 to 0.009 (not shown). We notice that this adjustment parallels that made in SPSS by adjusting for continuity. When expected counts per cell are relatively small (a working rule is that they should be at least five in each cell), one can also request Fisher's exact test (see Fisher, 1922a), which we note also mirrors the output generated by SPSS:

      > fisher.test(diag.table) Fisher's Exact Test for Count Data data: diag.table p-value = 0.008579 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 1.466377 26.597383 sample estimates: odds ratio 5.764989

equation

      where χ2 is the chi‐square statistic calculated on the 2 × 2 table, and n is the total sample size. The maximum ϕ can attain is 1.0, indicating maximal association. ϕ can be computed in SPSS by /statistics = phi and is available in R in the psych package (Revelle, 2015). Cramer's ϕc extends on ϕ in that it allows for contingency tables of greater than 2 × 2. It is included in the /statistics = phi command and also available in R's psych package. It is given by:

equation

      where k is the minimum of the number of rows or columns. The relationship between ϕc and ϕ is easily shown for k = 2:

equation

      2.2.1 Power for Chi‐Square Test of Independence

      > library(pwr) > pwr.chisq.test (w =, N =, df =, sig.level =, power = )

      where w is the anticipated or required effect size, estimated as:

equation

      and p0i and p1i are the probabilities in a given cell i under the null and alternative hypotheses, respectively. We demonstrate by estimating power for w = 0.2:

      > pwr.chisq.test(w = 0.2, N =, df = 5, sig.level = .05, power = 0.90) Chi squared power calculation w = 0.2 N = 411.7366 df = 5 sig.level = 0.05 power = 0.9 NOTE: N is the number of observations

Exposure Condition Absent (0) Condition Present (1) Total
Males Yes 10 20 30
No 15 5 20
Females Yes 13 17 30
No 12 8 20
Total 50 50 100

      For data such as that in Table 2.2 featuring higher‐dimensional frequency data, log‐linear models are a possibility (Agresti, 2002). Log‐linear models are an option in the wider class of generalized linear models, to be discussed further in Chapter 10, where we discuss in some detail a special case of the generalized linear model called the logistic regression model.

      The sensitivity of the diagnostic instrument is the probability that the test is positive given that the individual has the disease. In the margins, we see that 30 people have the disease, of which 20 were diagnosed with it. Thus, the sensitivity of the test is 20/30 = 0.66. The specificity of the diagnostic instrument is the probability that the test is negative, given that the individual does not have the disease. In the margins, we see that 20 people do not have the disease, of which 15 were diagnosed with not having the disease. Hence, the specificity of the test is 15/20 = 0.75. The overall prevalence of the disease is equal to 30/50 (i.e., 30 people have the disease out of 50). One can also compute what are known as positive and negative predictive values from such tables. For these and other measures useful for diagnostic situations, see Dawson and Trapp (2004).

      Recall that in our discussion of the so‐called “soft” versus “hard” sciences in Chapter 1, we concluded that a key principal difference between the two is not necessarily one of different statistical or