Robert Carver

Practical Data Analysis with JMP, Third Edition


Скачать книгу

Advantage of Linked Graphs and Tables to Explore Data

      When we construct graphs, JMP automatically links all open tables and graphs. If we select rows either in the data table or in a graph, JMP selects and highlights those rows in all open windows.

      1. Within the 2015 life expectancy histogram, place the cursor over the right-most bar and click. While pressing the Shift key, also click the adjacent bar. Now you should have selected the two bars representing life expectancies of over 75 years. How many rows are now selected? Look in the Rows panel of the Data Table window.

      2. Now find the first window with the Distribution of Region (second tab in your project). Notice that some bars are partially highlighted. When you selected the two bars in the histogram, you were indirectly selecting a group of countries. These countries are grouped within the bar chart as shown, revealing the parts of the world where people tend to live longest.

      Customizing Bars and Axes in a Histogram

      When we use the Distribution platform to analyze a continuous variable, JMP determines how to divide the variable axis and how to create “bins” for grouping observations. These automatic choices can affect the appearance of the distribution and there are several ways to customize the appearance of a histogram.

      We can alter the number of bars in the histogram, creating new boundaries between groups of observations and shifting observations from one bar to the next.

      1. Move back to the Distribution report tab. Click anywhere in a blank area of the 2015 histogram to de-select the two right bars

      2. Choose Tools ► Grabber.

      3. Position the hand Figure 1.1 Some JMP Help Options anywhere over the bars in the 2015 histogram beneath the box plot, and click-drag the tool straight up and down. In doing so, you will change the number and width of the bars, sometimes dramatically changing the shape of the graph.

      Think about this: the apparent shape of the distribution depends on the number of bars we create. By default, the software chooses an initial number of bars, or bins, to categorize the continuous variable. However, that initial choice should not be the final word. As we adjust the number of bins, we should watch closely to see how the shape changes, looking for a rendering that accurately and honestly displays the overall pattern of variation.

      One way to resolve the issue is by using a shadowgram. A shadowgram visually averages a large number of bin widths into a diffuse image with no distinct bars at all. Here is how:

      4. Click the red triangle next to LifeExp in the 2015 histogram.

      5. Choose Histogram Options ► Shadowgram. Figure 3.8 shows the result.

      Figure 3.8: A Shadowgram for a Continuous Variable

Figure 1.1 Some JMP Help Options

      You should notice that there are several Histogram Options. While you are here, explore them—see what there is to see.

      We can also change the scale of the horizontal axis interactively. Initially, JMP set the left and right endpoints, and the limits changed when we chose uniform scaling. Suppose we want the axis to begin at 30 and end at 85.

      6. Move the cursor to the left end of the horizontal axis, and notice that the hand now points to the left (this is true whether you have previously chosen the hand tool or not). Click and drag the cursor slowly left and right, and see that you are scrunching or stretching the axis values. Stop when the minimum value is 30.

      7. Move the cursor to the right end of the axis, and similarly set the maximum at 100 years just by dragging the cursor.

      Finally, we can “pan” along the axis. Think of the border around the graph as a camera’s viewfinder through which we see just a portion of the entire infinite axis.

      8. Without holding the mouse button, move the cursor toward the middle of the axis until the hand points upward. Now click and drag to the left or right, and you will pan along the axis.

      9. Alternatively, rather than clicking and dragging to change axis attributes, you can directly edit all “Axis Settings” by double-clicking on the axis itself. This opens a dialog box where you can specify a variety of settings.

      Our original data table contains values for 12 years, and we have now compared the variation in life expectancy for two years. The Graph Builder can allow us to make a quick visual comparison over 12 years.

      1. First, we want to clear our earlier filtering so that we can now access all years. Choose Rows ► Clear Row States to deselect, show, and include all rows.

      2. Select Graph ► Graph Builder.

      3. Drag LifeExp to the X drop zone.

      4. Find the menu bar at the top of the Graph Builder window and locate the Histogram button Figure 1.1 Some JMP Help Options near the center. Click it.

      5. Drag Year to the Wrap drop zone and click the Done button. Your graph should look like Figure 3.9.

      Figure 3.9: Longer Lives in Most of the World, 1960 to 2015

Figure 1.1 Some JMP Help Options

      What do you see as you inspect these small multiple histograms? Can you see life expectancies gradually getting longer in most countries? There were two peaks in 1960: many countries with short lives, and many with longer lives. The lower peak slowly flattened out as the entire distribution has crept rightward.

      Graphs are an ideal way to summarize a large data set and to communicate a great deal of information about a distribution. We can also describe variation in a quantitative variable with summary statistics (also called summary measures or descriptive statistics). Just as a distribution has shape, center, and dispersion, we have summary statistics that capture information about the shape, center, or dispersion of a variable.

      Let’s look back at the distribution report for our sample of 2015 life expectancies in 198 countries of the world. Just to the right of the histogram, we find a table of Quantiles followed by a list of Summary Statistics.

      Figure 3.10: Quantiles and Summary Statistics

Figure 1.1 Some JMP Help Options

      Quantile is a generic term; you might be more familiar with percentiles. When we sort observations in a data set, divide them into groups of observations, and locate the boundaries between the groups, we establish quantiles. When there are 100 such groups, the boundaries are called percentiles. If there are four such groups, we refer to quartiles.

      For example, we find that the 90th percentile is 81.54 years. This means that 90% of the observations have life expectancies shorter than 81.54 years. JMP also labels five quantiles known as the five-number summary. They identify the minimum, maximum, 25th percentile (1st quartile or Q1), 50th percentile (median), and 75th percentile (3rd quartile or Q3). Of the 198 countries listed in the data table, one-fourth have life expectancies shorter than 66.43 years, and one-fourth have life expectancies longer than 77.49 years.

      Summary Statistics refer to the common descriptive statistics shown in Figure ‎3.10. At this stage in your study of statistics, three of these statistics are useful, and the other three should wait until Chapter 8.

      ● The mean is the simple arithmetic average of the observations, usually denoted by the symbol Figure 1.1 Some JMP Help Options and computed as follows: