patients may be ranked 1–30 on the basis of their summated ratings of a hospital radio programme, or schools are ranked according to the performance of their pupils in examinations. We would normally rank-order only a fairly limited number of people or objects. To rank 300 people 1–300 would be rather cumbersome. Alternatively, respondents in a survey may be asked to rank a number of items; for example, customers may be asked to rank seven brands 1–7 in terms of value for money. Respondents may find this tricky, so paired comparisons may be used. If, for example, seven brands of beer are to be ranked, then respondents are asked to say which of two brands they prefer, taking each combination of pairs, of which there will be n(n − 1)/2 pairs or 21 combinations. The results can be converted into a rank order by counting the number of times each brand is preferred.
Binary, nominal, ordered category and ranked variables all use sets of values that are usually labelled in words. However, in order to be able to enter these values into data analysis software like IBM SPSS (which is introduced in the next chapter), the labels need to be identified by numbers which are used as codes that ‘stand for’ each value. There are few rules that suggest how this should be achieved. They may be assigned arbitrarily and it does not matter if we assign 1 = male and 2 = female, or 1 = female and 2 = male, or even 26 = male and 39 = female. What we certainly cannot do, for example, is, if we take 1 = male and 2 = female and we have 60 males and 40 females, calculate the ‘average sex’ as 1.4! As we will see later, any self-respecting computer will happily perform this calculation for you: the trick is to realize that the result is total nonsense. At the ordered category level, again we can assign the numbers arbitrarily, but they must preserve the order.
Metric variables arise when either there is a metric like age in years that can be used to calibrate distances between recorded values, or the values are a result of counting the number of instances involved as a measure of size. Think about how we might measure the size of a car park. We could either measure the area in square metres, or count up the number of parking spaces it provides. The first procedure might give any value in square metres and fractions of a square metre up to however many decimal places are required. The second method will produce only whole numbers or integers. In statistical parlance, the first is usually called a continuous variable and the second a discrete variable. The values for metric variables are numeric rather than in words, as with categorical variables.
Table 1.1 lists the variables used in the alcohol marketing study, the values recorded and the types of measure. Notice that only two of the variables are listed as continuous metric – the age at which respondents first had their alcoholic drink and the total units of alcohol last consumed. It is quite common in survey research that most of the variables are binary, nominal, ordered category or discrete metric.
Variables may be seen as containers, and each case has a place in each container (one container for each property) either in one of two or more compartments or at a certain ‘level’ inside the container. Set memberships, by contrast, focus on whether or not (or the extent to which) cases ‘belong’ in a container. Cases, then, may be members of some sets, but not others. Thus a nation-state may be a member of the sets ‘democratic’, ‘having strong trade unions’ and ‘low crime rate’, but not of the sets ‘unregulated press’ or ‘strict controls on the possession of guns by individuals’. The focus is then on which combinations of memberships characterize each case. Sets are based on notions of inclusion or exclusion; boundaries are defined in a way that creates containers into which cases may or may not be assigned.
Set memberships may be crisp or fuzzy. With crisp sets, cases are unambiguously members or not members of a set. Thus the UK is a member of the EU but not a member of the eurozone. Crisp sets are identical to binary variables in that they record the presence or absence of a characteristic, but they are allocated not codes but set membership values. For this reason, the latter are, in this text, indicated in square brackets. Full membership is always indicated with a value of [1] and non-membership with a value of [0]. Crisp sets have only these two values. Crisp sets are at the base of set theory and what has become known as Boolean logic. George Boole (1847) was a nineteenth-century mathematician and logician who developed an algebra suitable for properties with only two possible values. Set theory and Boolean logic are explained in more detail in Chapter 7 on configurational approaches to data analysis.
In reality, the world and large parts of social science phenomena do not come naturally in binary form. Membership of the category ‘democratic country’ or ‘profitable organization’ may be a matter of degree. Fuzzy sets record degrees of membership of a defined category by permitting membership values in the interval between [1] and [0]. They distinguish between cases that are ‘more in’ a set than out of it and are given values above [0.5], for example a value of [0.8] to indicate that a case is mostly in a set, and those more out of a set than in it are given values below [0.5]. The crossover point of [0.5] indicates cases that are neither in nor out of a set. It is the point of maximum ambiguity. This, Ragin (2000) emphasizes, should be conceptually defined according to the theory or theories being applied, or according to empirical evidence, research findings or researcher understanding of the cases involved. The researcher has to decide, for example, what being a ‘heavy’ viewer of television entails, for example in terms of hours viewing per day or per week and at what point a viewer is no longer in the set ‘heavy viewer’. This should not be an arithmetical mean or average, which is driven by the particular dataset used, but an absolute value that is not affected by other values in the set. The assessment of crisp and fuzzy sets is explained in detail in Chapter 7.
Key points and wider issues
Values are what researchers actually record as a result of the process of assessing properties of cases. However, they may arise either as variables or as set memberships. The values of variables assess cases relative to one another; sets define memberships in absolute terms according to generally agreed external standards or based on a combination of theoretical knowledge and practical experience of cases. The values recorded for variables arise from one or more of the measurement activities of classifying, ordering, ranking, counting or calibrating the characteristics of cases. These activities result in variables that may be binary, nominal, ordered category, ranked, discrete metric or continuous metric. These types of variable can themselves be seen as a variable having ordered category characteristics of increasing complexity from binary up to continuous metric.
Sometimes the distinction between the different types of measure is blurred, or difficult to make or open to interpretation. Thus the numerical totals derived from summated rating scales should, strictly speaking, be used only to create ordered categories from high to low. The totals only indicate relative positions, so that while 15 is ‘higher’ than 12, by how much is unclear because there are no ‘units’ of measurement. There is no metric for measuring people’s attitudes, opinions or beliefs. In practice, however, researchers commonly treat the results, particularly of Likert scales, as if they are metric and will calculate average scores and use the results in statistical procedures that require metric data. Such a practice is making the assumption of equivalence of distance between recorded values, so that the difference, for example between ‘Strongly agree’ and ‘Agree’, is the ‘same as’ the distance between ‘Agree’ and ‘Neither’ and between ‘Disagree’ and ‘Strongly disagree’. This may be a reasonable assumption for the standard Likert scale and little error may result from acting as if the resulting scale is metric, but for other kinds of scale the assumption may be unwarranted or at least questionable.
The assessment of set membership may result in crisp or fuzzy sets. Crisp sets are the same as binary variables and record cases as either members or non-members of a set. Fuzzy sets allow degrees of membership and combine binary with metric characteristics with membership values between [1] and [0] with [0.5] as the crossover point.
Error in data construction
The construction of data of any