refer to it as the categorical level of measurement. The categories have characteristics that differ but are not quantified as to the amount of the difference. For example, political party, religious affiliation, gender, and so forth can be recorded, grouped, and counted. Yet we do not say, for example, that one religion is more of a religion than another.
Under certain conditions, the most typical type of average, the mean (i.e., arithmetic average), is appropriate for nominal data. That is, the variable has only two possible responses, and talking about the percentage that corresponds to one of those responses makes sense. With gender coded 0 for female and 1 for male, it would make sense to use the mean to say that a group is 60% male.
Variables coded and interpreted as we have just seen find use in a variety of statistical techniques requiring at least interval levels of measurement (a discussion about interval level data is coming shortly). Some nominal data, therefore, can be quite useful in answering a surprisingly broad range of questions.
Both the high school principal and the director of public health have nominal data. The high school principal has data on gender, school club membership, sports participation, and scholastic topics for each student. Some aspects of these measures could be coded as nominal, such as variables for the names of extracurricular activities (e.g., yearbook).
The director of public health has access to a host of demographic data that are nominal, such as ethnicity and zip code. Generally, nominal data are summarized in tables or cross-tabulations of two characteristics, such as sports participation by gender or immunization rates by age or age grouping. Nominal data also delineate many of the groups of interest to research. Nonetheless, even in her naming of variables, she needs to be careful of racial and ethnic sensitivities (e.g., should a category be Black, Black American, African American, or combined with other categories into Americans of Color?) or some stakeholders could be offended and not pay attention to the point of her report.
5.B. Ordinal
With distances unsure
Blindly even steps
Arrive at cracks
Ordinal measurement is common for opinion polls. We can distinguish between levels of agreement but cannot be sure that the psychological distance between pairs of adjoining response choices are equivalent. For example, the psychological distance between “strongly disagree” and “moderately disagree” might not be the same as the distance between “neutral” and “moderately agree.” In these cases, an arithmetic average (the mean) might not yield an interpretable answer.
The high school principal has ordinal scales from some student surveys that he has already conducted, and of which he might generate more. Although the case could be made that course grades really are ordinal, they have been and continue to be used as interval (the next topic) since their creation. The debate is whether the difference in knowledge of a topic between two students scoring, say, 20 and 60 points on a test is the same as that between students scoring 60 points and 100 points.
Along with actual medical data, the director of public health has results for perception surveys on the services received by the state’s medical assistance recipients. She also has another survey to be implemented fairly soon, a state requirement of her department. Most of her medical data, however, are either nominal or ratio, at least in how they are handled.
For statistics appropriate to ordinal data, both the high school principal and the director of public health will use frequency counts for the responses to each of their surveys’ items and a form of chi-square (described a bit later) for statistical significance tests. They both will use medians and modes (also discussed later) to describe these central tendencies. Recognizing ordinal data for what they are can save many later headaches. Statisticians using ordinal data with statistics requiring interval data sometimes pay a harsh price in terms of their reputation.
5.C. Interval
Interval is regular
Same steps, no cracks
Yet zero is not none
Interval data have evenly spaced steps but no true zero. Course grades could be an example, where a zero score on a math test does not mean a complete lack of mathematics knowledge. A zero on a math test means that the student did not arrive at a single correct answer for the sample of possible relevant mathematics questions on the test, but the test has no way of capturing whether the student has no knowledge of the assessed topic. The zero is a measurement convenience.
Many statistics require interval levels of measurement (or could use ratio, discussed next) to yield valid results. Topics from grading differences in sections of the same course to the predicted flu infection rates for next year generally require this level of data. At a minimum, some reflection is appropriate when determining which statistics will be used with data.
For the high school principal, most student achievement measurements are used as though they were at an interval level of measurement, as discussed earlier. The very fact that the debate continues, more than a century since its inception, is testimony to the resiliency (what statisticians call “robustness”) of the mean to minor violations of its required level of measurement.
Many examples of the director of public health’s data are dichotomous (i.e., only two possible responses). For example, immunizations are coded for people in one of two ways, either yes (1) or no (0). These types of data generally can be used in statistical techniques that assume interval levels of measurement.
Examples of true interval data are somewhat rare. The most common are Fahrenheit and Celsius scales to measure temperature. In the end, the interval level of measurement is important to the proper selection, use, and interpretation of statistical methods, but it has few true examples in daily practice.
5.D. Ratio
The rare ruler
The flexible measure
Precious property
A ratio level of measurement scale has a true zero and is the trophy of data types. Weight and height are examples. We can say that half of 100 pounds is 50 pounds, and twice 6 feet is 12 feet. In other words, we can form interpretable ratios. These types of data are almost carefree in their use with regard to their level of measurement (assumptions on their distributions is another story, which will be told shortly).
The high school maintains basic health information in the nurse’s office on such things as height, weight, and inoculations, but the high school principal would likely need a good reason to be granted access to many of these data. The principal does have, however, somewhat unlimited access to absentee and tardiness information. These variables are at a ratio level of measurement. Depending on how the data are coded and used, though, they could be at any of the levels of measurement. Recoding (assigning new codes after the fact) can further complicate understanding the data’s true level of measurement. Yet the principal has no hesitation in asking his office staff to do the tedious aspects of his research for him, as they are responsible for the same types of security and attention to detail as he is.
The director of public health has electronic medical information for all Medicaid recipients in her state, although restrictions on the data’s use are quite stringent.