Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics


Скачать книгу

get the following:

      Over all possible samples, the probability is 0.95 that the range between imagesand imageswill include the true mean, μ.

      Very important to note regarding the above statement is that μ is not the random variable. The part that is random is the sample on which is computed the interval. That is, the probability statement is not about μ but rather is about samples. The population mean μ is assumed to be fixed. The 95% confidence interval tells us that if we continued to sample repeatedly, and on each sample computed a confidence interval, then 95% of these intervals would include the true parameter.

      The 99% confidence interval for the mean is likewise given by:

      Though of course not very useful, a 100% confidence interval, if constructed, would be defined as:

equation equation

      That is, if you want to have zero confidence in guessing the location of the population mean, μ, then guess the sample mean images. Though the sample mean is an unbiased estimator of the population mean, the probability that the sample mean covers the population mean exactly, as mentioned, essentially converges to 0 for a truly continuous distribution (Hays, 1994). As an analogy, imagine coming home and hugging your spouse. If your arms are open infinitely wide (full “bear hug”), you are 100% confident to entrap him or her in your hug because your arms (limits of the interval) extend to positive and negative infinity. If you bring your arms in a little, then it becomes possible to miss him or her with the hug (e.g., 95% interval). However, the precision of the hug is a bit more refined (because your arms are closing inward a bit instead of extending infinitely on both sides). If you approach your spouse with hands together (i.e., point estimate), you are sure to miss him or her, and would have 0% confidence of your interval (hug) entrapping your spouse. An inexact analogy to be sure, but useful in visualizing the concept of confidence intervals.

      When we speak of likelihood, we mean the probability of some sample data or set of observations conditional on some hypothesized parameter or set of parameters (Everitt, 2002). Conditional probability statements such as p(D/H0) can very generally be considered simple examples of likelihoods, where typically the set of parameters, in this case, may be simply μ and σ2. A likelihood function is the likelihood of a parameter given data (see Fox, 2016).

      When we speak of maximum‐likelihood estimation, we mean the process of maximizing a likelihood subject to certain parameter conditions. As a simple example, suppose we obtain 8 heads on 10 flips of a presumably fair coin. Our null hypothesis was that the coin is fair, meaning that the probability of heads is p(H) = 0.5. However, our actual obtained result of 8 heads on 10 flips would suggest the true probability of heads to be closer to p(H) = 0.8. Thus, we ask the question:

      Which value of θmakes the observed result most likely?

      If we only had two choices of θ to select from, 0.5 and 0.8, our answer would have to be 0.8, since this value of the parameter θ makes the sample result of 8 heads out of 10 flips most likely. That is the essence of how maximum‐likelihood estimation works (see Hays, 1994, for a similar example). ML is the most common method of estimating parameters in many models, including factor analysis, path analysis, and structural equation models to be discussed later in the book. There are very good reasons why mathematical statisticians generally approve of maximum likelihood. We summarize some of their most favorable properties.

      Firstly, ML estimators are asymptotically unbiased, which means that bias essentially vanishes as sample size increases without bound (Bollen, 1989). Secondly, ML estimators are consistent and asymptotically efficient, the latter meaning that the estimator has a small asymptotic variance relative to many other estimators. Thirdly, ML estimators are asymptotically normally distributed, meaning that as sample size grows, the estimator takes on a normal distribution. Finally, ML estimators possess the invariance property (see Casella and Berger, 2002, for details).

      A measure of model fit commonly used in comparing models that uses the log‐likelihood is Akaike's information criteria, or AIC (Sakamoto, Ishiguro, and Kitagawa, 1986). This is one statistic of the kind generally referred to as penalized likelihood statistics (another is the Bayesian information criterion, or BIC). AIC is defined as:

equation

      where Lm is the maximized log‐likelihood and m is the number of parameters in the given model. Lower values of AIC indicate a better‐fitting model than do larger values. Recall that the more parameters fit to a model, in general, the better will be the fit of that model. For example, a model that has a unique parameter for each data point would fit perfectly. This is the so‐called saturated model. AIC jointly considers both the goodness of fit as well as the number of parameters required to obtain the given fit, essentially “penalizing” for increasing the number of parameters unless they contribute to model fit. Adding one or more parameters to a model may cause −2Lm to decrease (which is a good thing substantively), but if the parameters are not worthwhile, this will be offset by an increase in 2m.

      The Bayesian information criterion, or BIC (Schwarz, 1978) is defined as −2Lm + m log(N), where m, as before, is the number of parameters in the model and N the total number of observations used to fit the model. Lower values of BIC are also desirable when comparing models. BIC typically penalizes model complexity more heavily than AIC. For a comparison of AIC and BIC, see Burnham and Anderson (2011).