Karen Robson

Multilevel Modeling in Plain Language


Скачать книгу

errors

      aReference category is Australian Capital Territory

      * < 0.05, ** p < 0.01, *** p < 0.001

      Now is a good time to review the interpretation of coefficients as this will be important for understanding multilevel model outputs as well. The unstandardized coefficients (all in their own units of measurement) in Table 1.4 would be interpreted as:

       Compared to being in Australian Capital Territory, being in New South Wales is associated with a decrease in standardized reading scores by 0.075, controlling for the other variables in the model.

       Compared to being in Australian Capital Territory, being in Victoria is associated with a decrease in reading scores by 0.747, controlling for the other variables in the model.

       Compared to being in Australian Capital Territory, being in Queensland is associated with a 0.168 decrease in reading scores, controlling for the other variables in the model.

       Compared to being in Australian Capital Territory, being in South Australia is associated with a 0.089 decrease in reading scores, controlling for the other variables in the model.

       Compared to being in Australian Capital Territory, being in Western Australia is associated with a 0.222 decrease in reading scores, controlling for the other variables in the model.

       Compared to being in Australian Capital Territory, being in Tasmania is associated with a 0.189 decrease in reading scores, controlling for the other variables in the model.

       The coefficient for the Northern Territory is not statistically significant.

       Compared to being a female, being a male is associated with a 0.377 decrease in reading scores, controlling for the other variables in the model.

       Each unit increase in the parental occupational status scale is associated with a 0.019 increase in reading scores, controlling for all the other variables in the model.

       The constant is the reading score of –0.687, which is the value when all the independent variables have a value of zero. In this case it would be a female living in Australian Capital Territory whose parents have an occupational status score of zero – which isn’t possible with this particular occupational status measure.

      Clearly there are ‘region’ effects here – the Australian Capital Territory (ACT) seems to be the best place for reading scores. The problem with this type of analysis is that children are nested within the regions. Furthermore, they are nested within schools, and even classrooms. While our analysis is at the individual level, the observations aren’t completely independent in the sense that there are eight regions into which pupils are divided.

      An OLS regression assumes that the coefficients presented in the table are ‘independent’ of the effects of the other variables in the model, but, in this type of model, this assumption is false. If we believe that regions – an overarching structure – affect students differentially, the effect that regions have on reading scores is not independent of the effects of the other variables in the model. The structures themselves share similarities that we cannot observe but, nevertheless, influence our results. It is probably the case that parents’ occupational status and the gender of the student are also differentially associated with reading scores, depending on the region in which the student attends school. The OLS assumption of independent residuals is probably violated as well. It is likely that the reading scores within each region may not be independent, and this could lead to residuals that are not independent within regions. Thus, the residuals are correlated with our variables that define structure. We need statistical techniques that can handle this kind of data structure. OLS is not designed to do this.

      Group estimates

      Maybe at this point you are wondering, ‘If groups are so important, maybe I should just focus on group-level effects.’ You may think that a possible solution to these problems is just to conduct analyses at the group level. In other words, to avoid the problem of giving group characteristics to individuals, just aggregate the data set so that we focus on groups, rather than individuals. In addition to having far less detailed models and violating some theoretical arguments (i.e. supposing your hypotheses are actually about individual- and group-level processes), a common problem with this approach is usually the sample size. If we have data on 13,000 students in eight different regions, aggregating the data to the group level would leave us with just eight cases (i.e. one row of data representing one region). To conduct any sort of meaningful multivariate analysis, a much larger sample size is required. As we mentioned earlier in this chapter, focusing on group estimates may lead to an error in logic known as the ecological fallacy where group characteristics are used to generalize to individuals. Thus, there are several reasons not to rely on group estimates.

      Varying effects across contexts

      In OLS, there is an assumption that the effects of independent variables are the same across contexts. For example, the effect of gender on school achievement is the same for everyone regardless of the region in which they go to school. We have many reasons to suspect that the effects of individual characteristics vary across contexts – that their impacts are not the same for everyone, regardless of group membership. We may find that the effects of gender are more pronounced in particular regions, for example. The context may influence the impact of gender on school achievement – for example, some regions may have an official policy around raising the science achievement of girls or the reading achievement of boys (as they are typical problems in the school achievement literature). Multilevel models allow regression effects (coefficients) to vary across different contexts (in this case, region) while OLS does not.

      Table 1.5 table 5

      Unstandardized regression coefficients. Standard errors in parentheses. POS – parental occupational status; ACT – Australian Capital Territory; NSW – New South Wales; VIC – Victoria; QLD – Queensland; SA – South Australia; WA – Western Australia; TAS – Tasmania; NT – Northern Territory.

      * p < 0.05, ** p < 0.01, *** p < 0.001

      We might now think that one possible solution would be to run separate individual-level OLS regressions for each group. Table 1.5 displays the results of such an exercise.

      This may seem to solve the problem of examining how group differences affect the impact of independent variables on the outcome of interest. You can see, for example, that the effect of being male ranges from being –0.446 in New South Wales (NSW) to –0.246 in the Australian Capital Territory (ACT). Likewise with parental occupational status (POS), the coefficients range from 0.017 in South Australia (SA) to 0.024 in the ACT and Western Australia (WA). These results do not tell us if the values are statistically significantly different from each other, without further calculations, and they also do not tell us anything about group properties which may influence or interact with individual-level outcomes. In addition to being a poor specification, this technique can get unwieldy if you have a large number of groups. Here we have only eight and the presentation of results is already rather difficult.

      Another possible solution in OLS to effects varying across contexts might be to run interaction terms. You probably learned in your statistics training about interaction effects or moderating effects. If we thought that an independent variable affects an outcome differentially based upon the value of another independent variable, we could test this by using interaction terms. Based on the criticism above of running separate regressions for each group, a reasonable solution may seem to be to create interaction terms between the region and the other independent variables. We were making a similar argument earlier when we suggested that gender might impact on student achievement depending on region. We create the interaction terms by multiplying gender by region and parental occupational status by region (gender * regions; parental occupational status * regions) and we add them to the OLS regression as a set of new independent variables. If the interaction terms are statistically