Samprit Chatterjee

Handbook of Regression Analysis With Applications in R


Скачать книгу

rel="nofollow" href="#ulink_edba1f88-7c52-5535-a136-ecab1a4b9b9e">(2.4) to account for model selection uncertainty is just a part of the more general problem that standard degrees of freedom calculations are no longer valid when multiple models are being compared to each other as in the comparison of all models with a given number of predictors in best subsets. This affects other uses of those degrees of freedom, including the calculation of information measures like images, images, images, and images, and thus any decisions regarding model choice. This problem becomes progressively more serious as the number of potential predictors increases and is the subject of active research. This will be discussed further in Chapter 14.

      It is not unusual for the observations in a sample to fall into two distinct subgroups; for example, people are either male or female. It might be that group membership has no relationship with the target variable (given other predictors); such a pooled model ignores the grouping and pools the two groups together.

      On the other hand, it is clearly possible that group membership is predictive for the target variable (for example, expected salaries differing for men and women given other control variables could indicate gender discrimination). Such effects can be explored easily using an indicator variable, which takes on the value images for one group and images for the other (such variables are sometimes called dummy variables or imagesvariables). The model takes the form

equation equation

      for nonmembers and

equation

      for members. The images‐test for whether images is thus a test of whether a constant shift model (two parallel regression lines, planes, or hyperplanes) is a significant improvement over a pooled model (one common regression line, plane, or hyperplane).

      Would two different regression relationships be better still? Say there is only one numerical predictor images; the full model that allows for two different regression lines is

equation

      for nonmembers (images), and

equation

      for members (images). The pooled model and the constant shift model can be made to be special cases of the full model, by creating a new variable that is the product of images and images. A regression model that includes this variable,

equation

      corresponds to the two different regression lines

equation

      for nonmembers (since images), implying images and images above, and

equation

      for members (since images), implying images and images above.

equation

      on