David Machin

Medical Statistics


Скачать книгу

Median 25–75th centile n Median 25–75th centile EQ‐5D tariff 98 0.73 (0.59–0.80) 101 0.73 (0.66–0.80) EQ 5D VAS (0–100) 100 80.0 (60.0–90.0) 99 79 (60.0–90.0)

      Categorical or Qualitative Data

       Nominal Categorical Data

      Nominal or categorical data are data that one can name and put into categories. They are not measured but simply counted. They often consist of unordered ‘either‐or’ type observations that have two categories and are often know as binary. For example: dead or alive; male or female; cured or not cured; pregnant or not pregnant. In Table 2.1 gender is a binary variable. However, categorical data often can have more than two categories, for example: blood group O, A, B, AB, country of origin, ethnic group or social class. The methods of presentation of nominal data are limited in scope. Thus Table 2.1 gives the number and percentage of people treated at each of the seven centres in each of the two randomised groups. Categorical data is sometimes referred to as ‘qualitative’, to distinguish it from ‘quantitative’ which we will discuss later. However, there is a whole area of methodology called ‘qualitative research’ and so to avoid confusion we will not us this term.

       Ordinal Data

       Ranks

      In some studies it may be appropriate to assign ranks. For example, patients with corns may be asked to order their preference for treatment, for example, hard skin (corn) removal by scalpel; special rehydration creams for thickened skin; customised soft padding or foam insoles; corn plaster containing salicylic acid. Here although numerical values from 1 to 4 may be assigned to each treatment we cannot treat them as numerical values. They are in fact only codes for best, second best, third choice, and worst.

      Numerical or Quantitative Data

       Count Data

      Table 2.1 gives details of the number of corns each participant had at the start of the trial, since this can only be a whole number or integer value, for example, 0, 1, 2, or 3 in this trial, this is termed count data. Other examples are often counts per unit of time such as the number of deaths in a hospital per year, or the number of attacks of asthma a person has per month. In dentistry, a common measure is the number of decayed, filled or missing teeth (DFM).

       Measured or Numerical Continuous

      Such data are measurements that can, in theory at least, take any value within a given range. These data contain the most information, and are the ones most commonly used in statistics. Examples of continuous data in Table 2.1 are age, size of index corn, visual analogue scale (VAS), pain score and EQ‐5D tariff.

      However, for simplicity, it is often the case in medicine that continuous data are dichotomised to make binary data. Thus, diastolic blood pressure, which is continuous, is converted into hypertension (>90 mmHg) and normotension (≤90 mmHg). This clearly leads to a loss of information. There are two main reasons for dichotomising data. It is easier to describe a population by the proportion of people affected, for example, the proportion of people in the population with hypertension is 10%. Further one often has to make a decision: if a person has hypertension, then they will get treatment, and this too is easier if high blood pressure has been categorised.

      One can also divide a continuous variable into more than two groups. For example, we could divide age into age bands of equal lengths of, say 10 years such as: 0–9; 10–19; 20–29, etc. When categorising continuous data authors should give an indication as to why they chose these cut‐off points, and a reader has to be very wary to guard against the fact that the cuts may be chosen to make a particular point. Some statisticians have termed the habit of categorising continuous variables as ‘dichotomania’, which they regard as poor practice since it loses information and assumes a discontinuous relationship that is unlikely in nature.

      Interval and Ratio Scales

      One difficulty with giving ranks to ordered categorical data is that one cannot assume that the scale is interval. Thus, as we have indicated when discussing ordinal data, one cannot assume that risk of a corn healing for a current smoker, relative to a non‐smoker, is the same as the risk for a previous smoker relative to a non‐smoker. Were Farndon et al. (2013) simply to score the three levels of smoking as 0, 1, 2 in their subsequent analysis, then this would imply in some way the intervals between the levels or scores have equal numerical value.

      Binary data are the simplest type of data in which each individual has a label that takes one of two values such as: male or female; corn healed or not healed. A simple summary would be to count the different types of label. However, a raw count is rarely useful. For example, in Table 2.1 there are more non‐smokers in the scalpel group (40 out of 99 or 40%) compared to corn plaster group (34 out of 98 or 35%). It is only when this number is expressed as a proportion that it becomes useful. Hence the first step to analysing categorical data is to count the number of observations in each category and express them as proportions of the total sample size.

      Illustrative Example – Salicylic