in matrix form is required in statistical analyses of higher dimensions than 1 (e.g., multiple regression, multivariate analysis of variance, principal components analysis, etc.). The fundamental general linear model can be given by Y = XB + E.
Understanding what makes a p‐value small or large is essential if a researcher is to intelligently interpret statistical evidence is his or her field. The history of null hypothesis significance testing (NHST) is plagued with controversy, and a solid understanding of the difference between statistical significance and effect size (e.g., Cohen's d) is necessary before one attempts to interpret any research findings.
Review Exercises
1 2.1. Distinguish between a density and an empirical distribution. How are they different? How are they similar?
2 2.2. Consider the univariate normal density: Show that for a standard normal distribution, the above becomes .
3 2.3. Explain the nature of a z‐score, Why is it also called a standardized score?
4 2.4. Using R, compute the probability of observing a standardized score of 1.0 or greater. What is then the probability of observing a score less than 1.0 from such a distribution?
5 2.5. Think up a research example in which the binomial distribution would be useful in evaluating a null hypothesis.
6 2.6. Rafael Nadal, a professional tennis player, as of 2020 had won the French Open tennis championship a total of 13 times in the past 16 tournaments. If we set the probability of him winning each time at 0.5, determine the probability of winning 13 times out of 16. Make a statistical argument that Nadal is an exceptional tennis player at the French Open. What if we set the probability of a win at 0.1? Does this make Nadal's achievements less or more impressive? Why? Explain.
7 2.7. Give an example using the binomial distribution in which the null hypothesis would not be rejected even if observing 9 out of 10 heads on flips of a coin.
8 2.8. On a fair coin, what is the probability of observing 0 heads or 5 heads? How did you arrive at this probability, and which rules of probability did you use in your computation?
9 2.9. Discuss what a limiting form of a distribution means, and how the limiting form of the binomial distribution is that of the normal distribution.
10 2.10. Consider the multivariate density: All else constant, what effect does an increasing value of the determinant (∣∑∣) have on the density, and how does this translate when using real variables?
11 2.11. What is meant by the expectation of a random variable?
12 2.12. Compare these two products, and explain how and why they are different from one another when taking expectations: yip(yi) versus yip(yi)dy
13 2.13. Why is it reasonable that the arithmetic mean is the center of gravity of a distribution?
14 2.14. What is an unbiased estimator of a population mean vector?
15 2.15. Discuss what it means to say that E(S2) ≠ σ2, and the implications of this. What is E(S2) equal to?
16 2.16. Even though E(S2) ≠ σ2, how can it be true nonetheless that ? Explain.
17 2.17. Explain why the following form of the sample variance is considered to be an unbiased estimator of the population variance:
18 2.18. Draw a distribution that is positively skewed. Now draw one that is negatively skewed.
19 2.19. Compare and contrast the covariance of a random variable: cov(xi, yi) = σxy = E[(xi − μx) (yi − μy)] with that of the sample covariance: How are they similar? How are they different? What in their definitions makes them different from one another?
20 2.20. What effect (if any) does increasing sample size n have on the magnitude of the covariance? If it does not have any effect, explain why it does not.
21 2.21. Explain or show how the variance of a variable can be conceptualized as the covariance of a variable with itself.
22 2.22. Cite three reasons why the covariance is not a pure or dimensionless measure of relationship between two variables.
23 2.23. Why is Pearson r not suitable for measuring relationships that are nonlinear? What is an alternative coefficient (one of many) that may be computed that is more appropriate for relationships that are nonlinear?
24 2.24. What does it mean to say the relationship between two variables is monotonically increasing?
25 2.25. What does a correlation matrix have along its main diagonal that a covariance matrix does not? What is along the main diagonal of a covariance matrix?
26 2.26. Define, in general, what it means to measure something.
27 2.27. Explain why it is that something measurable at the ratio level of measurement is also measurable at the interval, ordinal, and nominal levels as well.
28 2.28. Is something such as intelligence measurable on a ratio scale? Why or why not?
29 2.29. Distinguish between a mathematical variable and a random variable.
30 2.30. Distinguish between an estimator and an estimate.
31 2.31. Define what is meant by an interval estimator.
32 2.32. Define what is meant by the consistency of an estimator and what means in this context.
33 2.33. Compare the concepts of efficiency versus sufficiency with regard to estimators. How are they different?
34 2.34. The sampling distribution of the mean is an idealized distribution. However, discuss how one would generate the sampling distribution of the mean empirically.
35 2.35. Discuss why for a higher level of confidence, all else equal, a confidence interval widens rather than narrows.
36 2.36. Define what is meant by a maximum‐likelihood estimator.
37 2.37. Discuss the behavior of the t distribution for increasing degrees of freedom. What is the limiting form of the t distribution?
38 2.38. In a research setting, under what condition(s) is a t‐test usually preferred over a z‐test?
39 2.39. Verbally interpret the nature of pooling in the independent‐samples t‐test. Under what condition(s) do we pool variances? Under what condition(s) should we not pool?
40 2.40. Discuss why an estimate of effect size is required for estimating power.
41 2.41. Using R, estimate required sample size for detecting a population correlation coefficient of 0.30 at a significance level of 0.01, with power equal to 0.80.
42 2.42. Repeat exercise 2.41, this time using G*Power.
43 2.43. Using R, estimate power for an independent samples t‐test for a sample size of 100 per group and Cohen's d equal to 0.20.
44 2.44. For a value of r2 = 0.70, compute the corresponding value for d.
45 2.45. Discuss how the paired‐samples t‐test can be considered a special case of the wider and more general blocking design.
46 2.46. Define what is meant by a linear combination.
47 2.47. Define and describe each term in the multivariate general linear modelY = XB + E.
48 2.48. Discuss the key determinants of the p‐value in a significance test.
49 2.49. A researcher collects a sample of n = 10, 000 observations and tells you that with such a large sample size, he is guaranteed to reject the null hypothesis. Explain why the researcher's claim is false.
50 2.50. A researcher collects a sample size of n = 5, computes zM and rejects the null hypothesis. Argue on the one hand for why this might be impressive scientifically, then argue why it may not be.
51 2.51. Consider once more Galton's data on heights (only the first 10 observations are shown):> library(HistData)> attach(Galton)> Galton parent child 1 70.5 61.7 2 68.5 61.7 3 65.5 61.7 4 64.5 61.7 5 64.0 61.7 6 67.5 62.2 7