Robert Carver

Practical Data Analysis with JMP, Third Edition


Скачать книгу

Analysis with the Multivariate Platform 98

      Further Analysis with Fit Y by X 100

      Summing Up: Interpretation and Conclusions 101

      Visualizing Multiple Relationships 101

      The prior four chapters introduced several foundational concepts of data analysis and have also led you through a series of illustrative analyses. In this chapter, we pause to pull together what we have learned about descriptive analysis. This is the first in a series of short review chapters, each of which shares the common goal of recapitulating recent material and calling upon you, the reader, to apply the principles, concepts, and techniques that you have recently studied.

      In the review chapters, you will find fewer step-by-step instructions. Instead, there are guiding questions to remind you of the analytical process that you’ll want to follow. Refer to earlier chapters if you have forgotten how to perform a task. In this and all future chapters, do your computer work within a JMP Project. The examples in this and later review chapters are based on the World Development Indicators, collected and published by the World Bank.

      The World Bank was established in 1944 to assist with the redevelopment of countries after World War II. It has evolved into a group of five institutions concerned with interconnected missions of economic development. One important goal of its work is the alleviation of poverty worldwide, and as part of that mandate, the World Bank annually publishes the World Development Indicators (WDI) gathered from 215 nations. Earlier, we looked at some of the WDI data about birth rates and life expectancy.

      In the public sector as well as in business, policy makers rely on accurate, current data to gauge progress and to evaluate the impact of policy decisions. The WDI data informs policy-making by many agencies globally, and the World Bank’s annual data collection and reporting play an important role in the U.N.’s Millennium Development Goals.

      Sustainable Development Goals

      At the start of the millennium, the United Nations sponsored a Millennium Summit, which led to the adoption of the Millennium Declaration by 189 member states. The declaration laid out an ambitious set of goals including commitments to “combat poverty, hunger, disease, illiteracy, environmental degradation, and discrimination against women. The MDGs are derived from this Declaration, and all have specific targets and indicators.” 2

      The Millennium Development Goals include “8 goals, 21 targets, and 60 indicators for measuring progress between 1990 and 2015, when the goals are expected to be met.”3 More recently, the MDG’s have evolved into Sustainable Development Goals (SDGs). Some of these indicators are identical to the World Bank’s WDIs.

      The eight goals, listed below, are:

      ● Eradicate extreme poverty and hunger

      ● Achieve universal primary education

      ● Promote gender equality and empower women

      ● Reduce child mortality

      ● Improve maternal health

      ● Combat HIV/AIDS, malaria, and other diseases

      ● Ensure environmental sustainability

      ● Develop a Global Partnership for Development

      We return to our introductory example from Chapter 1 about variation in life expectancies worldwide, and further investigate variables that might give us insight into why people in some regions tend to live long than in others. We will initially analyze some of the WDI measures from the year 2016 to explore and to understand their variability, as well as plausible associations between variables. In doing so, we will engage in an extended exercise in statistical reasoning and review several of the descriptive techniques that were presented in the first four chapters.

      In Chapter 1, we speculated about possible factors that might contribute to variation in life expectancies. At that time, we wondered about the impacts of education, health care, basic sanitation, nutrition, political stability, and wealth. Relying on several of the WDIs in our data, we will focus on four constructs as measured by the variables listed in Table 5.1. Notice that several of these variables are indirect measures of the constructs; statisticians sometimes refer to such variables as proxies to indicate that they “stand in” for a difficult-to-measure concept or attribute. Many of the WDIs are reported for all or nearly all the nations, while others are reported more sparsely. For this chapter, we will use indicators that are reported for the large majority of the 215 countries. We will also examine life expectancies in different regions of the world.

      Note that WDI contains data through 2018 for just a few of the 42 columns. In this analysis, we will focus on 2016 because we have data for the variables of interest for most nations.

      Table 5.1: Ten Variables4 Used in This Chapter

ConstructVariable
WealthGross Domestic Product per capitaPoverty headcount ratio at national poverty lines (% of population)World Bank Income Group
Health CareHealth expenditures per capitaPercent of children ages 12–23 months receiving DPT immunization
EducationPercent of eligible children enrolled in primary educationRatio of female to male students enrolled secondary education (if genders are treated equitably, the ratio = 1)Percent of eligible children enrolled in tertiary (post-high school) education
NutritionPrevalence of undernourishment (% of population)Food production index: covers food crops that are considered edible and that contain nutrients.

      Our research questions in this chapter are:

      ● How did life expectancy vary around the world in 2016?

      ● How did each of the variables listed in the table above vary in 2016?

      ● How do we best describe the co-variation between each factor and life expectancy?

      In any statistical analysis, it is essential to be clear-minded about the research questions and about the nature of the data that we will be analyzing. We have mentioned these before, and this is a good time to review them in the context of a larger research exercise.

      Data Source and Structure

      We know these indicators are published by the World Bank for the purpose of monitoring economic development, identifying challenges, and assessing policy interventions. The figures are determined by analysts at the World Bank using estimates gathered and provided by governmental agencies from each country. Each of the annual indicators is best described as observational data, as opposed to experimental or survey.

      In this data table, we have 42 variables, or data series, observed for 215 countries for each year from 1990 through 2018. Hence, we have 29 years of repeated observations for 215 countries, giving us 6,235 rows. Many cells of the table are empty, reflecting difference in national statistical infrastructure or, in some cases, simply reflecting periods before the World Bank began monitoring a development indicator. We have relatively few observations for 2018.

      Recall the difference between cross-sectional data (many observations selected at one time from a population or process), and time series or longitudinal data (regularly repeated observations of a single observational unit). In this table, we have a combination: we have 29 sets of repeated cross-sectional samples.

      Observational Units

      Each row in the data table represents a country in a particular year. We will be dealing with aggregated totals and rates rather than with individual people.

      Variable Definitions and Data Types

      Before we dive into analysis, it is critical to understand what each variable represents, as well as the type of each variable. We need to understand the measurements in order to interpret them. We need to know the data types because that guides the choice of summary or descriptive techniques that are applicable.