Iain Pardoe

Applied Regression Modeling


Скачать книгу

images‐value is between images and images.

      As discussed at the beginning of this section, the 95% prediction interval for an individual value of images, images, is much wider than the 95% confidence interval for the population mean single‐family home sale price, which was calculated as

equation

      Unlike for confidence intervals for the population mean, statistical software does not generally provide an automated method to calculate prediction intervals for an individual images‐value. Thus, they have to be calculated by hand using the sample statistics, images and images. However, there is a trick that can get around this (although it makes use of simple linear regression, which we cover in Chapter 2). First, create a variable that consists only of the value 1 for all observations. Then, fit a simple linear regression model using this variable as the predictor variable and images as the response variable, and restrict the model to fit without an intercept (see computer help #25 in the software information files available from the book website). The estimated regression equation for this model will be a constant value equal to the sample mean of the response variable. Prediction intervals for this model will be the same for each value of the predictor variable (see computer help #30), and will be the same as a prediction interval for an individual images‐value. As further practice, calculate a 90% prediction interval for an individual sale price (see Problem 1.10). Calculate it by hand or using the trick just described. You should find that the interval is (images, images).

      We spent some time in this chapter coming to grips with summarizing data (graphically and numerically) and understanding sampling distributions, but the four major concepts that will carry us through the rest of the book are as follows:

      1 Statistical thinking is the process of analyzing quantitative information about a random sample of observations and drawing conclusions (statistical inferences) about the population from which the sample was drawn. An example is using a univariate sample mean, , as an estimate of the corresponding population mean and calculating the sample standard deviation, , to evaluate the precision of this estimate.

      2 Confidence intervals are one method for calculating the sample estimate of a parameter (such as the population mean) and its associated uncertainty. An example is the confidence interval for a univariate population mean, which takes the form

      3 Hypothesis testing provides another means of making decisions about the likely values of a population parameter. An example is hypothesis testing for a univariate population mean, whereby the magnitude of a calculated sample test statistic,indicates which of two hypotheses (about likely values for the population mean) we should favor.

      4 Prediction intervals, while similar in spirit to confidence intervals, tackle the different problem of predicting the value of an individual observation picked at random from the population. An example is the prediction interval for an individual univariate ‐value, which takes the form

      Problems

      “Computer help” refers to the numbered items in the software information files available from the book website. There are brief answers to the even‐numbered problems in Appendix F (www.wiley.com/go/pardoe/AppliedRegressionModeling3e).

      1 1.1 Assume that weekly orders of a popular mobile phone at a local store follow a normal distribution with mean and standard deviation . Find the scores, , that correspond to the:95th percentile (i.e., find such that );50th percentile (i.e., find such that );2.5th percentile (i.e., find such that ). Suppose represents potential values of repeated sample means from this population for samples of size . Use the normal version of the central limit theorem to find the mean scores, , that correspond to the:95th percentile (i.e., find such that );50th percentile (i.e., find such that );2.5th percentile (i.e., find such that ).How many phones should the store order to be 95% confident they can meet demand for a particular week?

      2 1.2 Assume that final scores in a statistics course follow a normal distribution with mean and standard deviation . Find the scores, , that correspond to the:90th percentile (i.e., find such that );99th percentile (i.e., find such that );5th percentile (i.e., find such that ). Suppose represents potential values of repeated sample means from this population for samples of size (e.g., average class scores). Use the normal version of the central limit theorem to find the mean scores, , that correspond to the:90th percentile (i.e., find such that );99th percentile (i.e., find such that );5th percentile (i.e., find such that ).If the bottom 5% of the class fail, what is the cut‐off percentage to pass the class?The university requires the long‐term average class score for this course to be no higher than 75%. Does this requirement seem feasible?

      3 1.3 The NBASALARY data file contains salary information for 214 guards in the National Basketball Association (NBA) for 2009–2010 (obtained from the online USA Today NBA Salaries Database).Construct a histogram of the variable, representing 2009–2010 salaries in thousands of dollars [computer help #14].What would we expect the histogram to look like if the data were normal?Construct a QQ‐plot of the variable [computer help #22].What would we expect the QQ‐plot to look like if the data were normal?Compute the natural logarithm of guard salaries (call this variable ) [computer help #6], and construct a histogram of this variable [computer help #14]. Hint: The “natural logarithm” transformation (also known as “log to base‐e,” or by the symbols or ln) is a way to transform (rescale) skewed data to make them more symmetric and normal.Construct a QQ‐plot of the variable [computer help #22].Based on the plots in parts (a), (c), (e), and (f), say whether salaries or log‐salaries more closely follow a normal curve, and justify your response.

      4 1.4 A company's pension plan includes 50 mutual funds, with each fund expected to earn a mean, , of 3% over the risk‐free rate with a standard deviation of %. Based on the assumption that the funds are randomly selected from a population of funds with normally distributed returns in excess of the risk‐free rate, find the probability that an individual fund's return in excess of the risk‐free rate is, respectively, greater than 34.1%, greater than 15.7%, or less than %. In other words, if represents potential values of individual fund returns, find:;;. Use the normal version of the central limit theorem