Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis. Читать онлайн. Hotlib. HOTLIB.NET

Applied Univariate, Bivariate, and Multivariate Statistics

illustration of a Venn diagram depicting hacking skills, math and statistics knowledge, and substantive expertise."/>

Source: From Drew Conway, THE DATA SCIENCE VENN DIAGRAM, Sep 30, 2010. Reproduced with permission from Drew Conway.

1.8 STATISTICS AND RELATIVITY

Statistical thinking is all about relativity. Statistics are not about numbers, they are about distributions of numbers (Green, 2000, personal communication). Rarely in statistics, or science for that matter, do we evaluate things in a vacuum.

Consider a very easy example. You board an airplane destined to your favorite vacation spot. How talented is the pilot who is flying your airplane? Is he a “good” pilot or a “bad” pilot? One would hope he is “good enough” to fulfill his duties and ensure your and other passengers' safety. However, when you start thinking like a statistician, you may ponder the thought of how good of a pilot he is relative to other pilots. Where on the curve does your pilot fall? In terms of his or her skill, the pilot of an airplane can be absolutely good, but still relatively poor. Perhaps that pilot falls on the lower end of the talent curve for pilots. The pilot is still very capable of flying the plane, they have passed an absolute standard, but he or she just isn't quite as good as most other pilots (see Figure 1.5).

We can come up with a lot of other examples to illustrate the absolute versus relative distinction. If someone asked you whether you are intelligent, ego aside, as a statistician, you may respond “relative to who?” Indeed, with a construct like IQ, relativity is all we really have. What does absolute intelligence look like? Should our species discover aliens on another planet one day, we may need to revise our definition of intelligence if such are much more (or much less) advanced than we are. Though of course, this would assume we have the intelligence to comprehend that their capacities are more than ours, a fact not guaranteed and hence another example of the trap of relativity.

Relativity is a benchmark used to evaluate much phenomena, from intelligence to scholastic achievement to prevalence of depression, and indeed much of human and nonhuman behavior. Understanding that events witnessed could be theorized to have come from known distributions (like the talent distribution of pilots) is a first step to thinking statistically. Most phenomena have distributions, either known or unknown. Statistics, in large part, is a study of such distributions.

Schematic illustration of the pilot criterion that must be met for any pilot to be permitted to fly a plane.

Figure 1.5 The “pilot criterion” must be met for any pilot to be permitted to fly your plane. However, of those skilled enough to fly, your pilot may still lay at the lower end of the curve. That is, your pilot may be absolutely good, but relatively poor in terms of skill.

1.9 EXPERIMENTAL VERSUS STATISTICAL CONTROL

Perhaps most pervasive in the social science literature is the implicit belief held by many that methods such as regression and analysis of covariance allow one to “control” variables that would otherwise not be controllable in the nonexperimental design. As is emphasized throughout this book, statistical methods, whatever the kind, do not provide methods of controlling variables, or “holding variables constant” as it were. Not in the real way. To get these kinds of effects, you usually need a strong and rigorous bullet‐proof experimental design.

It is true, however, that statistical methods do afford a method, in some sense, for presuming (or guessing) what might have been had controls been put into place. For instance, if we analyze the correlation between weight and height, it may make sense to hold a factor such as age “constant.” That is, we may wish to partial out age. However, partialling out the variability due to age in the bivariate correlation is not equivalent to actually controlling for age. The truth of the matter is that our statistical control is telling us nothing about what would actually be the case had we been able to truly control age, or any other factor. As will be elaborated on in Chapter 8 on multiple regression, statistical control is not a sufficient “proxy” whatsoever for experimental control. Students and researchers must keep this distinction in mind before they throw variables into a statistical model and employ words like “control” (or other power and action words) when interpreting effects. If you want to truly control variables, to actually hold them constant, you usually have to do experiments. Estimating parameters in a statistical model, confident that you have “controlled” for covariates, is simply not enough.

1.10 STATISTICAL VERSUS PHYSICAL EFFECTS

In the establishment of evidence, either experimental or nonexperimental, it is helpful to consider the distinction between statistical versus physical effects. To illustrate, consider a medical scientist who wishes to test the hypothesis that the more medication applied to a wound, the faster the wound heals. The statistical question of interest is—Does amount of medication predict the rate at which a wound heals? A useful statistical model might be a linear regression where amount of medication is the predictor and rate of healing is the response. Of course, one does not “need” a regression analysis to “know” whether something is occurring. The investigator can simply observe whether the wound heals or not, and whether applying more or less medication speeds up or slows down the healing process. The statistical tool in this case is simply used to model the relationship, not determine whether or not it exists. The variable in question is a physical, biological, “real” phenomenon. It exists independent of the statistical model, simply because we can see it. The estimation of a statistical model is not necessarily the same as the hypothesized underlying physical process it is seeking to represent.

In some areas of social science, however, the very observance of an effect cannot be realized without recourse to the statistics used to model the relationship. For instance, if I correlate self‐esteem to intelligence, am I modeling a relationship that I know exists separate from the statistical model, or, is the statistical model the only recourse I have to say that the relationship exists in the first place? Because of mediating and moderating relationships in social statistics, an additional variable or two could drastically modify existing coefficients in a model to the point where predictors that had an effect before such inclusion no longer do after. As we will emphasize in our chapters on regression:

When you change the model, you change parameter estimates, you change effects. You are never, ever, testing individual effects in the model. You are always testing the model, and hence the interpretation of parameter estimates must be within the context of the model.

This is one of the general problems of purely correlational research with nonphysical or “nonorganic” variables. It may be more an exercise in variance partitioning than it is in analyzing “true” substantive effects, since the effects in question may be simply statistical artifacts. They may have little other bases. Granted, even working with physical or biological variables this can be a problem, but it does not rear its head nearly as much. To reiterate, when we model a physical relationship, we have recourse to that physical relationship independent of the statistical model, because we have evidence that the physical relationship exists independent of the model. If we lost our modeling software, we could still “see” the phenomenon. In many models of social phenomena,

Скачать книгу