A. Gouveia Oliveira

Biostatistics Decoded


Скачать книгу

what the whole piece would be like. If the sample were large, we probably would have no difficulty answering that question. But if the sample were small, something could still be said about the piece. For example, if the sample contained only red circles over a yellow background, one could say that the sample probably did not come from a Persian carpet. In other words, by inspecting the sample one could say that it was consistent with a number of pieces of cloth but not with other pieces (Figure 1.7).

      Therefore, the purpose of sampling is to provide a means of evaluating the plausibility of several hypotheses about the structure of the population, through a limited number of observations and assuming that the structure of the population must be consistent with the structure of the sample. One immediate implication of this approach is that there are no sample size requirements in order to achieve representativeness.

      Let us verify the truth of this statement and see if this approach to sampling is still valid in the extreme situation of a sample size of one. We know that with the first approach we would discard such a sample as non‐representative. Will we reach the same conclusion with the current approach?

An illustration of modern view of the purpose of sampling. The purpose of sampling is the evaluation of the plausibility of a hypothesis about the structure of the population, considering the structure of a limited number of observations.

      

      One might say that this whole thing is nonsense, because such a conclusion is completely worthless. Of course it is, but that is because we did not bother spending a lot of effort in doing the study. If we wanted a more interesting conclusion, we would have to work harder and collect some more information about the population. That is, we would have to make some more observations to increase the sample size.

      Before going into this, think for a moment about the previous study. There are three important things to note. First, this approach to sampling still works in the extreme situation of a sample size of one, while that is not true for the classical approach. Second, the conclusion was correct (remember, it was said that one was very confident that the proportion of black balls in the population was a number between 5 and 100%). The problem with the conclusion, better said with the study, was that it lacked precision. Third, the inference procedure described here is valid only for random samples of the population, otherwise the conclusions may be completely wrong. Suppose that the proportion of black balls in the population is minimal, but because their color attracts our attention, if we looked at the balls before getting our sample, we were much more likely to select a flashy black ball than a boring white one. We would then make the same reasoning as before and reach the same conclusion, but we would be completely wrong because the sample was biased toward the black balls.

An illustration of inference with binary attributes.

      Let us return to the situation of a sample size of one and suppose that we want to estimate another characteristic of the balls in the population, for example, the average weight. This characteristic, or attribute, has an important difference from the color attribute, because weight can take many different values, not just two.

      Let us see if we can apply the same reasoning in the case of attributes taking many different values. To do so, we take a ball at random and measure its weight. Let us say that we get a weight of 60 g. What can we conclude about the average weight in the population? Now the answer is not so simple. If we knew that the balls were all about the same weight, we could say that the average weight in the population should be a value between, say, 50 and 70 g. If it were below or above those limits, it would be unlikely that a ball sampled at random would weigh 60 g.

An illustration of inference with interval attributes I.