Robert Carver

Practical Data Analysis with JMP, Third Edition


Скачать книгу

left, we find the Rows panel (Figure 1.5), which provides basic information about the number of rows (in this case 215, for 215 countries). Like the other two panels, this one provides quick reference information about the number of rows and their states.

      Figure 1.5: Rows Panel

Figure 1.1 Some JMP Help Options

      The top entry indicates that there are 215 observations in this data table. The next four entries refer to four basic row states in a JMP data table. Initially, all rows share the same state, in that none have been selected, excluded, hidden, or labeled. Row states enable us to control whether individual observations appear in graphs, are incorporated into calculations, or whether they are highlighted in various ways.

      The Data Grid area of the data table is where the data reside. It looks like a familiar spreadsheet format, and it just contains the raw data for any analysis. Generally speaking, each column of a table contains either a raw data value (for example, a number, date, or text) or the entire column contains a formula or the result of a computation. Unlike a spreadsheet, each cell in a JMP data table column must be consistent in this sense. You will not find some rows of a column representing one type of data and other rows representing a different type.

      Figure 1.1 Some JMP Help Options In the upper left corner of the data grid, you will see the region shown here. There is a triangular disclosure button (pointing to the left side here in Windows; on a Macintosh, it is an arrowhead ►). Disclosure buttons enable you to expand or contract the amount of information displayed on the screen. The disclosure button shown here lets you temporarily hide the three panels discussed above.

      4. Try it out! Click the disclosure button to hide and then reveal the panels. Also click the red triangles and the Header Graphs icon and notice what happens.

      The red triangles offer you menu alternatives that will not mean much at this point, but which we will discuss in the next section. The red triangle in the upper right corner (above the diagonal line) relates to the columns of the grid, and the one in the lower left corner to the rows.

      Below the right-hand red triangle is a small icon that looks like a bar chart. This opens thumbnail descriptive graphs for each column.

      The top row of the grid contains the column names, and the left-most column contains row numbers. The cells contain the data.

      Our main interest within this data table is how life expectancy varies around the world. Variation is so common as to be unremarkable, but the very fact that they vary is what leads us to analyze them. We can imagine many reasons that life expectancy varies around the world; there are differences in nutrition, wealth, access to health care and clean water, education, political stability, and so on. Are there systematic differences in different parts of the world?

      We have a table displaying all 215 countries, but it is difficult to detect patterns by scanning up and down a long list. As a first step in analysis, we will make some simple graphs to summarize the table information visually. Software affords us many options to visualize a set of data and can help us discover errors in the recording of the raw data, locate important patterns of variability, or identify possible connections between and among variables. JMP’s Graph Builder is an intuitive, interactive platform for visualization.

      1. From the Life Expectancy 2017 data table window, click Graph ► Graph Builder.

      The graph builder gives us a Cartesian plane on which we can create a JMP visualization representing multiple columns in a single visual display. There are numerous options available, but in this first example, we will look at just a few.

      In analyzing this set of data, our primary interest lies in the variation of life expectancy. Following one of JMP’s conventions, we will think of this column as our Y variable.

      2. To display life_exp on the Y axis, click the life_exp column in the panel of Variables, and drag it to the vertical Y drop zone in the Graph Builder window. When you do this, your screen should look like Figure 1.6.

      Figure 1.6: Using the Graph Builder

Figure 1.1 Some JMP Help Options

      In this graph, each dot represents the value for one country. If you move your cursor to any dot and hover, the name of the country and other data appear. So, for instance, we find that Hong Kong enjoyed the longest life expectancy. Notice that the reported life expectancies lie between approximately 53 years and 85 years, with a large number of countries enjoying life expectancies above 65 years.

      By default, JMP jitters the points in this graph (see the drop-down menu next to Jitter just below the list of variables). This spreads the points apart to the left and right, so that identical or similar values do not overlap in the graph.

      3. Click the Jitter menu and select None. You will see why jittering has its advantages. Explore the other options as well.

      Now let’s see how the values compare across different regions in the world. In the data table, we have already assigned a different color to each region, but have not provided a legend to explain the color coding.

      4. One way to produce a legend is to drag Region to the Color drop zone.

      Each global region is colored so that all the countries in East Asia and Pacific, for example, are red. This immediately reveals that nearly all the countries with short life expectancy are in Sub-Saharan Africa. This fact was not at all obvious from the initial data table; that is what visualization can do for us.

      5. Now move the cursor back to the list of columns and once again choose Region, and this time drag it to the Group X drop zone at the top of the tableau.

      When you do this, you will now have seven adjacent small graphs showing the values from each region. As you examine these graphs, you might notice that the values vary vertically within each region and that the patterns of variation are similar in some regions but dramatically different in others. The study of descriptive statistics largely revolves around common patterns of variation, comparisons of those patterns, and deviations from those patterns. Here again, it is very evident that the nations of Sub-Saharan Africa largely have the shortest life expectancies in the world. What other general patterns emerge?

      Because the data are reported geographically, another useful way to examine the patterns is to overlay them on a map. Doing so magnifies a few key points.

      6. In Graph Builder, click the Start Over button in the upper left.

      7. Drag the Country Code column to the lower left of Graph Builder into the drop zone labeled Map Shape.

      8. Now drag life_exp over the map and release the mouse button. Alternatively, you might drag life_exp into the Color drop zone. Your map should now look like Figure 1.7. At this point, click the Done button.

      Figure 1.7: Map of the World Colored by Life Expectancy

Figure 1.1 Some JMP Help Options

      As the legend to the right indicates, the colors shaded dark red enjoy the longest life expectancies and dark blue countries have the shortest life expectancies. This map is an alternative method to see how life expectancy varies around the world.

      Please note two limitations of this graph. You might have spotted a white “hole” in the center of Africa. These are countries for which JMP found no data in our data table. Additionally, there is a notation at the bottom of the graph indicating that JMP did not recognize some of the country abbreviations, and hence did not display them on the map.

      Of course, data analysis is not limited to graphing and mapping—there are numbers to be crunched, and JMP will do the heavy computational work. We have many pages ahead of us to learn how to request and to interpret