Handbook of Web Surveys. Jelke Bethlehem. Читать онлайн. Hotlib. HOTLIB.NET

Handbook of Web Surveys

scientific investigations based on samples, but they were lacking proper scientific foundations. The first known attempt of drawing conclusions about a population using only information about part of it was made by the English merchant John Graunt (1662). He estimated the size of the population of London. Graunt surveyed families in a sample of parishes where the registers were well kept. He found that on average there were three burials per year in 11 families. Assuming this ratio to be more or less constant for all parishes and knowing the total number of burials per year in London to be about 13,000, he concluded that the total number of families was approximately 48,000. Putting the average family size at 8, he estimated the population of London to be 384,000. Since this approach lacked a proper scientific foundation, John Graunt could not say how accurate his estimates were.

More than a century later, the French mathematician Pierre‐Simon Laplace realized that it was important to have some indication of the accuracy of his estimate of the French population. Laplace (1812) implemented an approach that was more or less similar to that of John Graunt. He selected 30 departments distributed over the area of France in such a way that all types of climate were represented. Moreover, he selected departments in which accurate population records were kept. Using the central limit theorem, Laplace proved that his estimator had a normal distribution. Unfortunately, he disregarded the fact that sampling was purposively, and not at random. These problems made application of the central limit theorem at least doubtful.

In 1895 Anders Kiaer (1895, 1997), the founder and first director of Statistics Norway, proposed his representative method. It was a partial inquiry in which a large number of persons were questioned. Selection of persons was such that a “miniature” of the population was obtained. Anders Kiaer stressed the importance of representativity. He argued that if a sample was representative with respect to variables for which the population distribution was known, it would also be representative with respect to the other survey variables. Example 1.1 describes the Kiaer's experiment about the representative method.

EXAMPLE 1.1 The representative method of Anders Kiaer

Anders Kiaer applied his representative method in Norway. His idea was to survey the population of Norway by selecting a sample of 120,000 people. Enumerators (hired only for this purpose) visited these people and filled in 120,000 forms. About 80,000 of the forms were collected by the representative method and 40,000 forms by a special (but analogue) method in areas where the working‐class people lived.

For the first sample of 80,000 respondents, data from the 1891 census were used to divide the households in Norway into two strata. Approximately 20,000 people were selected from urban areas and the rest from rural areas.

There was a selection of 13 representative cities from the 61 cities in Norway. All five cities having more than 20,000 inhabitants were included, and eight cities representing the medium sized and small towns, too. The proportion of selected people in cities varied: in the middle‐sized and small cities, the proportion was greater than in the big cities. Kiaer motivated this choice by the fact that the middle‐sized and small cities did not represent only themselves but a larger number of similar cities.

In Kristiania (nowadays Oslo) the proportion was 1/16, in the medium‐sized towns the proportion varied between 1/12 and 1/9, and in the small towns it was 1/4 or 1/3 of the population.

Based on the census, it was known how many people lived in each of the 400 streets of Kristiania, the capital of Norway. The sorting of the streets was in four categories according to the number of inhabitants. Then, there was the specification of a selection scheme for each category: the adult population enumeration was in 1 out of 20 for the smallest streets. In the second category, the adult population enumeration was in half of the houses in 1 out of 10 of streets. In the third category, the enumeration concerned one‐fourth of the streets, and the enumeration was every fifth house; and in the last category of the biggest streets, the adult population enumeration was on half of the streets and in 1 out of 10 houses in them.

In selecting the streets their distribution over the city was considered to ensure the largest possible dispersion and the “representative character” of the enumerated areas.

In the medium‐sized towns, the sample was selected using the same principles, though in a slightly simplified manner. In the smallest towns, the total adult population in three or four houses was enumerated.

The number of informants in each of the 18 counties in the rural part of Norway was decided considering census data. To obtain representativeness, municipalities in each country, it was used a classification according to their main industry, either as agricultural, forestry, industrial, seafaring, or fishing municipalities. In addition, the geographical distribution was considered.

The total number of the representative municipalities amounted to 109, which is six in each county on average. The total number of municipalities was 498.

The selection of people in a municipality was done in relation to the population in different parishes, and so all different municipalities were covered. The final step was to instruct enumerators to follow a specific path. In addition, instruction to the enumerators was to visit different houses situated close to each other. That is, they were supposed to visit not only middle‐class houses but also well‐to‐do houses, poor‐looking houses, and one‐person houses.

Kiaer did not explain in his papers how he calculated estimates. The main reason probably was that the representative sample construction was as a miniature of the population. This made computations of estimates trivial: the sample mean is the estimate of the population mean, and the estimate of the population total could be attained simply by multiplying the sample total by the inverse of sampling fraction.

A basic problem of the representative method was that there was no way of establishing the precision of population estimates. The method lacked a formal theory of inference. It was Bowley (1906, 1926) who made the first steps in this direction. He showed that for large samples, selected at random from the population, estimates had an approximately normal distribution. From this moment on, there were two methods of sample selection:

Kiaer's representative method, based on purposive selection, in which representativity played an essential role and for which no measure of the accuracy of the estimates could be obtained;

Bowley's approach, based on simple random sampling, for which an indication of the accuracy of estimates could be computed.

Both methods existed side by side until 1934. In that year the Polish scientist Jerzy Neyman published his famous paper (see Neyman, 1934). Neyman developed a new theory based on the concept of the confidence interval. By using random selection instead of purposive selection, there was no need any more to make prior assumptions about the population. The contribution of Neyman was not only that he proposed the confidence interval as an indicator for the precision of estimates. He also conducted an empirical evaluation of Italian census data and proved that the representative method based on purposive sampling was not able to provide satisfactory estimates of population characteristics. He established the superiority of random sampling (also referred to as probability sampling) over purposive sampling. Consequently, use of purposive sampling was rejected as a scientific sampling method.

Gradually probability sampling found its way into official statistics. More and more national statistical institutes introduced probability sampling for official statistics. However, the process was slow. For example, a first test of a real sample survey using random selection was carried out by Statistics Netherlands only in 1941 (see CBS, 1948). Using a simple random sample of size 30,000 from the population of 1.75 million taxpayers, it was shown that estimates were accurate.

The history of opinion polls goes back to the 1820s, in which period American newspapers attempted to determine political preference of voters just before the presidential election. These early polls did not pay much attention to sampling. Therefore, it was difficult to establish accuracy of results.

Скачать книгу