all the predictor variables are combined, and then the linear combination of Xs is correlated with Y. It is easy to visualize multiple regression with two predictors. This would be a line in three-dimensional space. Imagining more than two predictors is difficult and fortunately not necessary. Multiple regression produces an r value that reflects how well the linear combination of Xs predicts Y. Some predictor variables are likely to be better predictors of Y than others, and the analysis produces weights that can be used in a regression equation to predict Y. Simply multiply the values of the predictor variables by their respective weights, and you have your predicted value.
Y(predicted) = B1(X1) + B2(X2) + B3(X3) + … + Constant
In addition to the weights used to predict criterion values, multiple regression analysis also provides standardized weights called beta (β) weights. These values tell us something about each individual predictor in the regression analysis. They can be interpreted much like an r value, with the sign indicating the relationship between the predictor variable and the criterion variable and the magnitude indicating the relative importance of the variable in predicting the criterion. Thus, in a multiple regression analysis, we can examine the relative contribution of each predictor variable in the overall analysis.
As you just learned, multiple regression is used to determine the influence of several predictor variables on a single criterion variable. Let’s look briefly at two useful concepts in multiple regression: (1) partial and (2) semipartial (also called part) correlation.
Partial Correlation.
Sometimes we would like to measure the relationship between two variables when a third has an influence on them both. We can partial out the effects of that third variable by computing a partial correlation. Suppose there is a correlation between age and income. It seems reasonable that older people might make more money than younger people. Is there another variable that you think might be related to age and income? How about years of education? Older people are more likely to be better educated, having had more years to go to school, and it seems likely that better-educated people earn more. So what is the true relationship between age and income if the variable years of education is taken out of the equation? One solution would be to group people by years of education and then conduct a number of separate correlations between age and income for each education group. Partial correlation, however, provides a better solution by telling us what the true relationship is between age and income when years of education has been partialled out.
Semipartial Correlation.
As we just discussed, in partial correlation, we remove the relationship of one variable from the other variables and then calculate the correlation. But what if we want to remove the influence of a variable from only one of the other variables? This is called a semipartial correlation. For example, at our school, we accept senior students in our applied psychology program based on their grades in the first and second years. We have found a strong positive correlation between previous grades and performance in our program. Suppose we could also administer an entrance exam to use as another predictor, but the exam was expensive. We can use semipartial correlation to determine how much the entrance test will increase our predictive power over and above using previous grades.
How do we do this? Well, we correlate entrance test scores and performance in the program after first removing the influence of previous grades on program performance. This correlation value then will tell us what relationship remains between entrance test scores and program performance when the correlation with previous grades has been partialled out of program performance but not out of entrance test scores. In our example, we could decide, based on this correlation, whether an expensive entrance test helped our predictive ability enough for us to go ahead and use it.
Logistic Regression.
Suppose you were interested in predicting whether a young offender would reoffend. You measure a number of possible predictor variables, such as degree of social support, integration in the community, job history, and so on, and then follow your participants for 5 years and measure whether they reoffend. The predictor variables may be continuous, but the criterion variable is discrete; they reoffend or they don’t. When we have a discrete criterion variable, we use logistic regression. Just as we used a combination of the predictor variables to predict the criterion variable in multiple regression, we do the same thing in logistic regression. The difference is that instead of predicting a value for the criterion variable, we predict the likelihood of the occurrence of the criterion variable. We express this as an odds ratio—that is, the odds of reoffending divided by the odds of not reoffending. If the probability of reoffending is .75 (i.e., there is a 75% chance of reoffending), then the probability of not reoffending is .25 (1 − .75). The odds of reoffending are .75/.25, or 3:1, and the odds of not reoffending are .25/.75, or .33. We calculate the odds ratio of reoffending versus not reoffending as .75/.33, or 2.25. In other words, the odds of reoffending are two and a quarter times higher than those of not reoffending.
Factor Analysis.
Factor analysis is a correlational technique we use to find simpler patterns of relationships among many variables. Factor analysis can tell us if a large number of variables can be explained by a much smaller number of uncorrelated constructs or factors.
In psychology, one of the best examples of factor analysis is in the area of personality theory. What psychology student hasn’t heard of OCEAN? These five personality factors of openness, conscientiousness, extraversion, agreeableness, and neuroticism were described by McCrae and Costa (1987) as the only factors needed to describe someone’s personality. Though we may use hundreds of traits and characteristics to describe someone’s personality, these all factor down to just five unique dimensions. This factoring down was accomplished by factor analysis.
In factor analysis, the researcher is looking for underlying and independent factors that have not been directly measured to explain a lot of variables that have been measured. The procedure involves identifying the variables that are interrelated. Once the factors have been identified, it is up to the researcher to decide what construct this group of variables is measuring. These underlying factors are hypothetical—that is, inferred by the researcher. The researcher attempts to find the smallest number of factors that can adequately explain the observed variables and to determine the fundamental nature of those factors.
When you read a research report where factor analysis has been used, you will probably see a complicated-looking matrix called a correlation matrix. Don’t be discouraged. Keep in mind that although the mathematics are complex and beyond the scope of this book, the concept is reasonably simple. Can an underlying variable such as general intelligence explain a whole lot of variation in measures of mental abilities?
Cluster Analysis.
Cluster analysis includes a range of algorithms and methods used to group similar objects into categories, or clusters. The members of each cluster are thus more similar to each other than they are to members of other clusters. Unlike factor analysis, where the goal is to group similar variables together, in cluster analysis, the idea is to group similar members. Organizing data into meaningful structures or taxonomies is a task many researchers face. Cluster analysis is a method that can discover structure in data, but it does not in and of itself have any explanatory function. In other words, the analysis can find structure but does not explain it.
Imagine a hospital where patients are assigned to wards based on similar symptoms or perhaps similar treatments. Each ward could be considered a cluster. A cluster analysis might discover the similarities among the patients in each ward, and the researcher then has the job of determining why the cluster or ward is similar (i.e., symptoms, treatment, age, etc.).
Cluster analysis is often used when researchers have no a priori hypotheses and are in the