the same data displayed as a pie chart. One often sees pie charts in the literature. However, generally they are to be avoided as they can be difficult to interpret, particularly when the number of categories becomes greater than five. In addition, unless the percentages in the individual categories are displayed (as here) it can be much more difficult to estimate them from a pie chart than from a bar chart. For both chart types it is important to include the number of observations on which it is based, particularly when comparing more than one chart. Neither of these charts should be displayed in three dimensions (see Figure 2.3b for a three‐dimensional pie chart). Three‐dimensional charts feature in many spreadsheet packages, but are not recommended since they distort the information presented. They make it very difficult to extract the correct information from the figure, and, for example in Figure 2.3b the sectors that appear nearer the reader are over emphasised.
Figure 2.3 Pie chart showing where 202 patients with foot corns were treated
(Source: Farndon et al. 2013).
If the sample is further classified into whether the patient was treated with corn plasters or scalpel then it becomes impossible to present the data as a single pie or bar chart. We could present the data as two separate pie‐charts or bar charts side by side but it is preferably to present the data in one graph with the same scales and axes to make the visual comparisons easier.
In this case we could present the data as a clustered bar chart, as shown in Figure 2.4. This clearly shows that the distribution of the frequency of patients at each treatment centre by randomised treatment group is broadly similar. It is preferable to use the relative frequency scale on the vertical axis rather than the actual counts, particularly when the two groups are of different sizes, although in this example where the groups are of similar size this will not make much difference here.
Figure 2.4 Clustered bar chart showing where 202 patients with foot corns were treated by randomised group
(Source: Farndon et al. 2013).
If you do use the relative frequency scale as we have, then it is recommended good practice to report the actual total sample sizes for each group in the legend. In this way, given the total sample size and relative frequency (from the height of the bars) we can work out the actual numbers treated in each centre.
2.4 Summarising Continuous Data
A quantitative measurement contains more information than a categorical one, and so summarising these data is more complex. One chooses summary statistics to condense a large amount of information into a few intelligible numbers, the sort that could be communicated verbally. The two most important pieces of information about a quantitative measurement are ‘what is the average value?’ and ‘what is the spread of the data?’ These are categorised as measures of location (sometimes ‘central tendency’) and measures of spread or variability. A measure of location (average) and variability (spread) provides an informative but brief summary of a set of observations.
Measures of Location – The Three ‘Ms’ – Mean, Median and Mode
Mean or Average
The arithmetic mean or average of n observations
In the above equation, xi represents the individual sample values and
Example – Calculation of the Mean – Corn Size Data (mm)
In the randomised controlled trial that investigated the effectiveness of salicylic acid plasters compared with usual scalpel debridement for treatment of foot corns (Farndon et al. 2013), the baseline size of the index corn (at its widest diameter in mm) was measured by an independent podiatrist (foot specialist) who was not involved in the subsequent treatment of the patients. Consider the following 16 baseline corn sizes in mm, listed in ascending order, selected randomly from the 200 patients, with valid baseline corn size data, in the trial.
Thus, the mean
The major advantage of the mean is that it uses all the data values and is, in a statistical sense, therefore efficient. The mean also characterises some important statistical distributions to be discussed in Chapter 4. The main disadvantage of the mean is that it is vulnerable to what are known as outliers. Outliers are single observations that, if excluded from the calculations, have noticeable influence on the results. For example, if we had entered ‘100 mm’ instead of ‘10 mm’, for the 16th patient, in the calculation of the mean, we would find the mean changed from 3.6 to 9.3 mm. It does not necessarily follow, however, that outliers should be excluded from the final data summary, or that they result from an erroneous measurement.
If the data are binary, that is nominal and are coded 0 or 1, then
Median
The median is estimated by first ordering the data from smallest to largest, and then counting upwards for half the observations. The estimate of the median is either the observation at the centre of the ordering in the case of an odd number of observations, or the simple average of the middle two observations if the total number of observations is even.
Example – Calculation of the Median – Corn Size Data
Consider the following 16 corn sizes in millimetres selected randomly from the Farndon (2013) study. We order the 16 observations from smallest to largest (See