David Machin

Medical Statistics


Скачать книгу

step counts (and physical activity levels) may be affected by the presence of a viral infection whose presence is unrelated to the cause of the heart disease itself. Levels may also be influenced by the severity of the underlying condition and whether concomitant treatment is necessary for the patient. Levels could also be influenced by other factors, for example, alcohol, tobacco consumption and diet. The cause of some of the variation in step counts may be identified and its effect on the variability estimated. Other variation may have no obvious explanation and is usually termed random variation. This does not necessarily imply there is no cause of this component of the variation but rather that its cause has not been identified or is being ignored.

      Different patients with heart disease observed in the same way may have differing average levels of step counts (physical activity levels) from each other but with similar patterns of variation about these levels. The variation in mean step count levels from patient to patient is termed between‐subject variation.

      Observations on different subjects are usually regarded as independent. That is, the data values on one subject are not influenced by those obtained from another. This, however, may not always be the case, particularly with subjective measures such as pain or quality of life which may be influenced by the subject's personal judgement, and different patients may assist each other when recording their quality of life.

      Graphs

      In any graph there are clearly certain items that are important. For example, scales should be labelled clearly with appropriate dimensions added. The plotting symbols are also important; a graph is used to give an impression of pattern in the data, so bold and relatively large plotting symbols are desirable. This is particularly important if it is to be reduced for publication purposes or presented as a slide in a talk.

      A graph should never include too much clutter; for example, many overlapping groups each with a different symbol. In such a case it is usually preferable to give a series of graphs, albeit smaller, in several panels. The choice of scales for the axes will depend on the particular data set. If transformations of the axes are used, for example, plotting on a log scale, it is usually better to mark the axes using the original units as this will be more readily understood by the reader. Breaks in scales should be avoided. If breaks are unavoidable under no circumstances must points on either side of a break be joined. If both axes have the same units, then use the same scale for each. If this cannot be done easily, it is sensible to indicate the line of equality, perhaps faintly in the figure. False impressions of trend, or lack of it, in a time plot can sometimes be introduced by omitting the zero point of the vertical axis. This may falsely make a mild trend, for example a change from 101 to 105, into an apparently strong trend (seemingly as though from 1 to 5). There must always be a compromise between clarity of reproduction that is filling the space available with data points and clarity of message. Appropriate measures of variability should also be included. One such is to indicate the range of values covered by two standard deviations each side of a plotted mean.

      It is important to distinguish between a bar chart and a histogram. Bar charts display counts in mutually exclusive categories, and so the bars should have spaces between them. Histograms show the distribution of a continuous variable and so should not have spaces between the bars. It is not acceptable to use a bar‐chart to display a mean with standard error bars (see Chapter 6). These should be indicated with a data point surrounded with errors bars, or better still a 95% confidence interval.

      With currently available graphics software one can now perform extensive exploration of the data, not only to determine more carefully their structure, but also to find the best means of summary and presentation. This is usually worth considerable effort.

      Tables

      1 Is the number of subjects involved clearly stated?

      2 Are appropriate measures of location and variation used in the paper? For example, if the distribution of the data is skewed, then has the median rather the mean been quoted? Is it sensible to quote a standard deviation, or would a range or interquartile range, be better? In general do not use SD for data which have skewed distributions.

      3 On graphs, are appropriate axes clearly labelled and scales indicated?

      4 Do the titles adequately describe the contents of the tables and graphs?

      5 Do the graphs indicate the relevant variability? For example, if the main object of the study is a within‐subject comparison, has the within‐subject variability been illustrated?

      6 Does the method of display convey all the relevant information in a study? Can one assess the distribution of the data from the information given?

      Calculating the Sample Median

      If the n observations in a sample are arranged in increasing or decreasing order, the median is the middle value. If there are n observations the median is the ½(n + 1)th ordered value. If the number of observations, n, is odd there will be a unique median – the ½(n + 1)th ordered value. If n is even, there is strictly no middle observation, but the median is defined by convention as the mean of the two middle observations – the ½nth and (½n + 1)th.

      Calculating the median for the foot corn size data, as the number of observations is even (n = 16), the median is the average of the two middle observations – the ½(16)th and ([½ × 16] + 1)th, i.e. the eighth and ninth ordered values. So the median corn size is (3 + 3)/2 = 3 mm.

      Calculating the Quartiles and Inter Quartile Range

      Arrange the n observations in increasing or decreasing order. Split the data set into four equal parts –or quartiles using three cut‐points:

      1 Lower quartile (25th centile) or the ¼(n + 1)th ordered value;

      2 Median(50th centile) or the ½ (n + 1)th ordered value;

      3 Upper quartile(75th centile) or the ¾(n + 1)th ordered value.

      The interquartile range (IQR) is the upper quartile minus the lower quartile.