Douglas C. Montgomery

Introduction to Linear Regression Analysis


Скачать книгу

+ β1x is a line of mean values, that is, the height of the regression line at any value of x is just the expected value of y for that x. The slope, β1 can be interpreted as the change in the mean of y for a unit change in x. Furthermore, the variability of y at a particular value of x is determined by the variance of the error component of the model, σ2. This implies that there is a distribution of y values at each x and that the variance of this distribution is the same at each x.

image image

      In almost all applications of regression, the regression equation is only an approximation to the true functional relationship between the variables of interest. These functional relationships are often based on physical, chemical, or other engineering or scientific theory, that is, knowledge of the underlying mechanism. Consequently, these types of models are often called mechanistic models. For example, the familiar physics equation momentum = mass × velocity is a mechanistic model.

image image

      In general, the response variable y may be related to k regressors, x1, x2, …, xk, so that

      (1.3) image

      This is called a multiple linear regression model because more than one regressor is involved. The adjective linear is employed to indicate that the model is linear in the parameters β0, β1, …, βk, not because y is a linear function of the x’s. We shall see subsequently that many models in which y is related to the x’s in a nonlinear fashion can still be treated as linear regression models as long as the equation is linear in the β’s.

      An important objective of regression analysis is to estimate the unknown parameters in the regression model. This process is also called fitting the model to the data. We study several parameter estimation techniques in this book. One of these techmques is the method of least squares (introduced in Chapter 2). For example, the least-squares fit to the delivery time data is

ueqn4-1

      where in4-1 is the fitted or estimated value of delivery time corresponding to a delivery volume of x cases. This fitted equation is plotted in Figure 1.1b.

      A regression model does not imply a cause-and-effect relationship between the variables. Even though a strong empirical relationship may exist between two or more variables, this cannot be considered evidence that the regressor variables and the response are related in a cause-and-effect manner. To establish causality, the relationship between the regressors and the response must have a basis outside the sample data—for example, the relationship may be suggested by theoretical considerations. Regression analysis can aid in confirming a cause-and-effect relationship, but it cannot be the sole basis of such a claim.

      Finally it is important to remember that regression analysis is part of a broader data-analytic approach to problem solving. That is, the regression equation itself may not be the primary objective of the study. It is usually more important to gain insight and understanding concerning the system generating the data.

      An essential aspect of regression analysis is data collection. Any regression analysis is only as good as the data on which it is based. Three basic methods for collecting data are as follows:

       A retrospective