factor analysis is key to understanding much published research in the fields of psychology, education, sociology, political science, anthropology, and the health sciences. The purpose of this book is to provide you with a solid foundation in exploratory factor analysis, which, along with confirmatory factor analysis, represents one of the two major strands within this broad field. Indeed, a portion of this first chapter will be devoted to comparing and contrasting these two ways of conceptualizing factor analysis. However, before getting to that point, we first need to describe what, exactly, factors are and the differences between latent and observed variables. We will then turn our attention to the importance of having strong theory to underpin the successful use of factor analysis, and how this theory should serve as the basis upon which we understand the latent variables that this method is designed to describe. We will then conclude the chapter with a brief discussion of the software available for conducting factor analysis and an outline of the book itself. My hope in writing this book is to provide you, the reader, with a sufficient level of background in the area of exploratory factor analysis so that you can conduct analyses of your own, delve more deeply into topics that might interest you, and confidently read research that has used factor analysis. If this book achieves these goals, then I will count it as a success.
Latent and Observed Variables
Much research in fields such as psychology is focused on variables that cannot be directly measured. These variables are often referred to as being latent, and include such constructs as intelligence, personality, mood, affect, and aptitude. These latent variables are frequently featured in social science research and are also the focus for clinicians who want to gain insights into the psychological functioning of their clients. For example, a researcher might be interested in determining whether there is a relationship between extraversion and job satisfaction, whereas a clinician may want to know whether her client is suffering from depression. In both cases, the variables of interest (extraversion, job satisfaction, and depression) are conceived of as tangible, real constructs, though they cannot be directly measured or observed. We talk about an individual as being an extravert or we conclude that a person is suffering from depression, yet we have no direct way of observing either of those traits. However, as we will see in this book, these latent variables can be represented in the statistical model that underlies factor analysis.
If latent variables are, by their very nature, not observable, then how can we hope to measure them? We make inferences about these latent variables by using variables that we can measure, and which we believe are directly impacted by the latent variables themselves. These observed variables can take the form of items on a questionnaire, a test, or some other score that we can obtain directly, such as behavior ratings made by a researcher of a child’s behavior on the playground. We generally conceptualize the relationship between the latent and observed variables as being causal, such that one’s level on the latent variable will have a direct impact on scores that we obtain on the observed variable. This relationship can take the form of a path diagram, as in Figure 1.1.
Figure 1.1 Example Latent Model Structure
We can see that each observed variable, represented by the squares, is linked to the latent variable, denoted as F1, with unidirectional arrows. These arrows come from the latent to the observed variables, indicating that the former has a causal impact on the latter. Note also that each observed variable has an additional unique source of variation, known as error and represented by the circles at the far right of the diagram. Error represents everything that might influence scores on the observed variable, other than the latent variable that is our focus. Thus, if the latent variable is mathematics aptitude, and the observed variables are responses to five items on a math test, then the errors are all of the other things that might influence those math test responses, such as an insect buzzing past, distracting noises occurring during the test administration, and so on. Finally, latent variables (i.e., the factor and error terms) in this model are represented by circles, whereas observed variables are represented by squares. This is a standard way in which such models are diagrammed, and we will use it throughout the book.
In summary, we conceptualize many constructs of interest in the social sciences to be latent, or unobserved. These latent variables, such as intelligence or aptitude, are very important, both to the goal of understanding individual human beings as well as to understanding the broader world around us. However, these constructs are frequently not directly measurable, meaning that we must use some proxy, or set of proxies, in order to gain insights about them. These proxy measures, such as items on psychological scales, are linked to the latent variable in the form of a causal model, whereby the latent variable directly causes manifest outcomes on the observed variables. All other forces that might influence scores on these observed variables are lumped together in a latent variable that we call error, and which is unique to each individual indicator variable. Next, we will describe the importance of theory in both constructing and attempting to measure these latent variables.
The Importance of Theory in Doing Factor Analysis
As we discussed in the previous section, latent variables are not directly observable, and we only learn about them indirectly through their impact on observed indicator variables. This is a very important concept for us to keep in mind as we move forward in this book, and with factor analysis more generally. How can we know that performance or scores on the observed variables are in fact caused by the latent variable of interest? The short answer is that we cannot know for sure. Indeed, we cannot know that the latent variable does in fact exist. Is depression a concrete, real disease? Is extraversion an actual personality trait? Is there such a thing as reading aptitude? The answer to these questions is we don’t know for sure. How then can we make statements about an individual suffering from depression, or that Juan is a good reader, or that Yi is an extravert? We can make such statements because we have developed a theoretical model that explains how our observed scores should be linked to these latent variables. For example, psychologists have taken prior empirical research as well as existing theories about mood to construct a theoretical explanation for a set of behaviors that connote the presence (or absence) of depression. These symptoms might include sleep disturbance (trouble sleeping or sleeping too much), a lack of interest in formerly pleasurable activities, and contemplation of suicide. Alone, these are simply behaviors that could be derived from a variety of sources unique to each. Perhaps an individual has trouble sleeping because he is excited about a coming job change. However, if there is a theoretical basis for linking all of these behaviors together through some common cause (depression), then we can use observed responses on a questionnaire asking about them to make inferences about the latent variable. Similarly, political scientists have developed conceptual models of political outlook to characterize how people view the world. Some people have views that are characterized as being conservative, others have liberal views, and still others fall somewhere in between the two. This notion of political viewpoint is based on a theoretical model and is believed to drive attitudes that individuals express regarding particular societal and economic issues, which in turn are manifested in responses to items on surveys. However, as with depression, it is not possible to say with absolute certainty that political viewpoint is a true entity. Rather, we can only develop a model and then assess the extent to which observations taken from nature (i.e., responses to survey questions) match with what our theory predicts.
Given this need to provide a rationale for any relationships that we see among observed variables, and that we believe is the result of some unobserved variable, having strong theory is crucial. In short, if we are to make claims about an unobserved variable (or variables) causing observed behaviors, then we need to have some conceptual basis for doing so. Otherwise, the claims about such latent relationships carry no weight. Given that factor analysis is the formalized statistical modeling of these latent variable structures, theory should play an essential role in its use. This means that prior to conducting factor analysis, we should have a theoretical basis for what we expect to find in terms of the number of latent variables (factors), and