David Machin

Medical Statistics


Скачать книгу

on displaying data is given in Chapter 2.

      Choice of Summary Statistics and Statistical Analysis

      The summary statistics used and the analysis undertaken must reflect the basic design of the study and the nature of the data. In some situations, for example, a median is a better measure of location than a mean. (These terms are defined in Chapter 2.) In a matched study, it is important to produce an estimate of the difference between matched pairs, and an estimate of the reliability of that difference. For example, in a study to examine blood pressure measured in a seated patient compared with that measured when he or she is lying down, it is insufficient simply to report statistics for seated and lying positions separately. The important statistic is the change in blood pressure as the patient changes position and it is the mean and variability of this difference that we are interested in. This is further discussed in Chapter 7. A statistician can advise on the choice of summary statistics, the type of analysis and the presentation of the results.

      Medical Statistics and Data Science

      Because of the availability of large amounts of data over the last few decades, the term data science has emerged to describe the substantial current intellectual effort around research with the goal of extracting information from these data. The type of data currently available in all sorts of application domains is often massive in size, very heterogeneous and far from being collected under designed or controlled experimental conditions. Nonetheless, it contains information, often substantial information, and it has been argued that data science is a new interdisciplinary approach that makes maximal use of this information. However, data alone is typically not that informative and (machine) learning from data needs conceptual frameworks. Data science would seem to encompass statistics. However, we would argue that statistics is crucial for providing conceptual frameworks that enhance the understanding of fundamental phenomena, highlight limitations and provide a formalism for properly founded data analysis, information extraction and quantification of uncertainty, as well as for the analysis and development of algorithms that carry out these key tasks.

      As taught at a number of universities, data science differs from statistics in a number of ways. Statistics originated before the computer and its core concern is with statistical models. However, no serious statistician is beguiled into confusing their model with reality (‘All models are wrong, but some are useful’ to quote the famous statistician John Tukey). However, models are very useful in describing how the world might be, and for making generalisations beyond the data. Data science is empirical, reliant on large data sets, whereas one of the key successes of statistics is doing inference on relatively small data sets, such as those available in agriculture and laboratories. Data science is often used for prediction, and the idea is that with the vast amounts of data now available electronically (such as that provided by national health services) one can look at empirical relationships and build up accurate predictors, such as how drugs will behave in individuals. These predictions are often highly successful, but lacking models it can be difficult to know why it makes some predictions, and how generalizable the predictions might be. Data science is related to the concept of ‘big data’. However, simply because a sample is large does not mean it is unbiased.

      1  2.1 Types of Data

      2  2.2 Summarising Categorical Data

      3  2.3 Displaying Categorical Data

      4  2.4 Summarising Continuous Data

      5  2.5 Displaying Continuous Data

      6  2.6 Within-Subject Variability

      7  2.7 Presentation

      8  2.8 Points When Reading the Literature

      9  2.9 Technical Details

      10  2.10 Exercises

      This chapter describes different types of data that the reader is likely to encounter. It illustrates methods of summarising and displaying categorical data (bar charts, pie chart). It describes the different ways of summarising continuous data by measures of location or central tendency (mean, median, mode) and measures of spread or variability (range, variance, standard deviation, inter‐quartile range). It also illustrates how to display continuous data (dot‐plots, histograms, box‐and‐whisker plots).

      Example from the Literature – Salicylic Acid Plasters for Treatment of Foot Corns