David Machin

Medical Statistics


Скачать книгу

10 to <11 4 2.0 100 Total 200 100

      As we have noted, standard deviation is often abbreviated to SD in the medical literature. Sometimes for emphasis we will denote it by SD(x), where the bracketed term x is included for a reason to be introduced later.

       Means or Medians?

      Means and medians convey different impressions of the location of data, and one cannot give a prescription as to which is preferable; often both give useful information. If the distribution is symmetric, then in general the mean is the better summary statistic, and if it is skewed then the median is less influenced by the tails. If the data are skewed, then the median will reflect a ‘typical’ individual better. For example, if in a country median income is £20 000 and mean income is £24 000, most people will relate better to the former number.

      It is sometimes stated, incorrectly, that the mean cannot be used with binary, or ordered categorical data but, as we have noted before, if binary data are scored 0/1 then the mean is simply the proportion of 1s. If the data are ordered categorical, then again the data can be scored, say 1, 2, 3, etc. and a mean calculated. This can often give more useful information than a median for such data, but should be used with care, because of the implicit assumption that the change from score 1 to 2, say, has the same meaning (value) as the change from score 2 to 3, and so on.

      Dot Plots

      The simplest method of conveying as much information as possible is to show all of the data and this can be conveniently carried out using a dot plot. It is also useful for showing the distributions in two or more groups side by side.

       Example – Dot Plot – Baseline Corn Size

Dot plot depicting corn size by randomised treatment group for 200 patients with corns.

      (Source: data from Farndon et al. 2013).

      Histograms

Schematic illustration of the histogram of baseline index corn size for 200 patients with corns.

      (Source: data from Farndon et al. 2013).

      The choice of the number and width of intervals or bins is important. Too few intervals and much important information may be smoothed out; too many intervals and the underlying shape will be obscured by a mass of confusing detail. As a rule of thumb, it is usual to choose between 5 and 15 intervals, but the correct choice will be based partly on a subjective impression of the resulting histogram. In the corn plaster trial the baseline corn size was measured in integers to the nearest mm. In Figure 2.6 we have 10 intervals or bins of width 1 mm which fits our rule of thumb. In this example an interval of 1–1.99 mm covers bin 1, 2–2.99 mm covers bin 2, etc. Histograms with bins of unequal interval length can be constructed but they are usually best avoided.

       Box and Whisker Plot

      A box and whisker plot contains five pieces of summary information about the data: the median; upper quartile; lower quartile; maximum and minimum values. If the number of points is large, a dot‐plot can be replaced by a box and whisker plot and which is more compact than the corresponding histogram.

       Illustrative Example – Box and Whisker Plot – Birthweight by Type of Delivery

Schematic illustration of the box and whisker plot of size of corn at baseline by randomised group for 200 patients with corns.

      (Source: data from Farndon et al. 2013).

       Scatter Plots

      When one wishes to illustrate a relationship between two continuous variables, a scatter plot of one against