David Machin

Medical Statistics


Скачать книгу

       Illustrative Example – Scatter Plot – Baseline Corn Size by Corn Size at a Three month Follow‐up

Schematic illustration of the scatter plot of baseline corn size by corn size at a three month follow-up for 181 patients with corns.

      (Source: data from Farndon et al. 2013).

      It is likely that baseline corn size will have an influence on corn size at three months, but vice versa cannot be the case. In this case, if one variable, x, (baseline corn size) could cause the other, y, (three‐month corn size) then it is usual to plot the x variable on the horizontal axis and the y variable on the vertical axis.

      In contrast, if we were interested in the relationship between baseline corn size and height of the patient then either variable could cause or influence the other. In this example it would be immaterial which variable (corn size or height) is plotted on which axis.

      Measures of Symmetry

Graphs depict examples of two skewed distributions.

      For the corn size data, the mean from the 200 patients is 3.8 mm and the median is 4 mm so we conclude the data are reasonably symmetric. One is more likely to see skewness when the variables are constrained at one end or the other. For example, waiting time or time in hospital cannot be negative, but can be very large for some patients but relatively short for the majority and so it likely to be right or positively skewed.

      A common skewed distribution is annual income, where a few high earners pull up the mean, but not the median. In the UK about 68% of the population earn less than the average wage, that is, the mean value of annual pay is equivalent to the 68th percentile on the income distribution. Thus, many people who earn more than the earnings of 50% (the median) of the population will still feel under paid!

      In Figure 2.1, measurements were made only once for each subject. Thus the variability, expressed, say, by the standard deviation, is the between‐subject variability. If, however, measurements are made repeatedly on one subject, we are assessing within‐subject variability.

      Illustrative Example – Within‐Subject Variability – Total Steps per Day

Graph depicts the plot of total steps per day for 100 days for one participant in a global corporate challenge designed to increase physical activity.

      If another subject had also completed this experiment, we could calculate their within‐subject variation as well, and perhaps compare the variabilities for the two subjects using these summary measures. Thus a second subject had a mean step count of 12 745 with standard deviation of 4861 steps, and so has a smaller mean but similar variability.

      Successive within‐subject values are unlikely to be independent, that is, consecutive values will be dependent on values preceding them. For example, if a sedentary or inactive person records their step count on one day, then if the step count is low on one day it is likely to be low on the next day. This does not imply that the step count will be low, only that it is a good bet that it will be. In contrast, examples can be found in which high step counts are usually followed by lower values and vice versa. With independent observations, the step count on one day gives no indication or clue as to the step count on the next.

      Suppose successive observations on a patient with heart disease taken over time fluctuate around some more or less constant daily step count, then the particular level may be