A. Gouveia Oliveira

Biostatistics Decoded


Скачать книгу

measured in ordinal scales are often found in clinical research. Figure 1.2 shows three examples of ordinal scales: the item list, where the subjects select the item that more closely corresponds to their opinion, the Likert scale, where the subjects read a statement and indicate their degree of agreement, and the visual analog scale, where the subjects mark on a 100 mm line the point that they feel corresponds to their assessment of their current state. Psychometric, behavioral, quality of life, and, in general, many questionnaires commonly used in clinical research have an ordinal scale of measurement.

      If an interval scale has a meaningful zero, it is called a ratio scale. Examples of ratio scales are height and weight. An example of an interval scale that is not a ratio scale is the Celsius scale, where zero does not represent the absence of temperature, but rather the value that was by convention given to the temperature of thawing ice. In ratio scales, not only are sums and subtractions possible, but multiplications and divisions as well. The latter two operations are meaningless in non‐ratio scales. For example, we can say that a weight of 21 g is half of 42 g, and a height of 81 cm is three times 27 cm, but we cannot say that a temperature of 40°C is twice as warm as 20°C. With very rare exceptions, all interval‐scaled attributes that are found in research are measured in ratio scales.

      We said above that one important purpose of biostatistics is to determine the characteristics of a population in order to be able to make predictions on any subject belonging to that population. In other words, what we want to know about the population is the expected value of the various attributes present in the elements of the population, because this is the value we will use to predict the value of each of those attributes for any member of that population. Alternatively, depending on the primary aim of the research, we may consider that biological attributes have a certain, unknown value that we attempt to measure in order to discover what it is. However, an attribute may, and usually does, express variability because of the influence of a number of factors, including measurement error. Therefore, the mean value of the attribute may be seen as the true value of an attribute, and its variability as a sign of the presence of factors of variation influencing that attribute.

      There are several possibilities for expressing the expected value of an attribute, which are collectively called central tendency measures, and the ones most used are the mean, the median, and the mode.

      The mean is a very common measure of central tendency. We use the notion of mean extensively in everyday life, so it is not surprising that the mean plays an extremely important role in statistics. Furthermore, being a sum of values, the mean is a mathematical quantity and therefore amenable to mathematical processing. This is the other reason why it is such a popular measure in statistics.

      The median is the quantity that divides the sample into two groups with an equal number of observations: one group has all the values smaller than that quantity, and the other group has all the values greater than that quantity. The median, therefore, is a quantity that has a straightforward interpretation: half the observations are smaller than that quantity. Actually, the interpretation of the median is exactly the same as the mean when the values are symmetrically distributed about the mean and, in this case, the mean and median will have the same value. With asymmetric distributions, however, the median will be smaller than the mean.

      One problem with the median is that it is not a mathematical result. To obtain the median, first we must count the number of observations, as we do to compute the mean. Then we must sort all the values in ascending order, divide the number of observations by 2, and round the result to the nearest integer. Then we take this result, go to the observation that occupies that position in the sorted order, and obtain the value of that observation. The value is the median value. Further, if the number of observations is even, then we must take the value of the observation that has a rank in the sorted order equal to the division of the number of observations by 2, then add that value to the value of the next observation in the sorted order, and divide the result by 2 to finally obtain the median value.

      The median, therefore, requires an algorithm for its computation. This makes it much less amenable to mathematical treatment than the mean and, consequently, less useful. In many situations, however, the median is a much better measure of central tendency than the mean. For example, attributes that are measured on ordinal scales – recall that with ordinal scales sums and differences are meaningless – should almost always be summarized by the median, not the mean. One possible exception to this rule is when an ordinal scale has so many distinct values, say,