various sources. These might include, for example:
inappropriate specification of cases;
biased selection of cases;
random sampling error;
poor data capture techniques;
non-response;
response error;
interviewer error;
measurement error.
The appropriateness of the type or types of case specified for a piece of research is often taken for granted rather than argued and justified. Ideally, not only must cases share sufficient background characteristics for comparisons to be made between them, but also those characteristics must be relevant to the topic under investigation. So, selecting university students or ‘housewives’ to study attitudes of hostility towards allowing female or gay bishops in the Church of England may not be appropriate.
Once the characteristics that define the research population of cases have been specified, the selection of cases to be used in the research may be made in a variety of different ways, but should, as far as possible, avoid the over- or under-representation of types of case. This may arise because, for example, the sampling frame used to select cases from omits certain kinds of case, or non-random methods of case selection have been used, for example interviewers have been asked to select respondents. Any biases in selection procedures will not be reduced by increasing the size of the sample.
Even in carefully conducted random samples, there will always be potential for fluctuations from sample to sample such that values of properties recorded for a particular sample may not reflect the actual values on the population from which the sample was drawn. The probability of getting sampling errors of given sizes can be calculated using statistical inference, which is explained at various points in Chapters 4–6.
Error in the construction of data can arise as a result of poor data capture techniques, for example in questionnaire design. There are many things that can go wrong both in the design of individual questions and in the overall design of the questionnaire. Some of these problems arise because the researcher has not followed the guidelines for good questionnaire design, in particular the stages for questionnaire construction, for question wording, routeing and sequencing. Any of these problems will result in errors of various kinds and their extent is unlikely to be known. It has been shown many times over that the responses people give to questions is notoriously sensitive to question wording. However, answers are also affected by the response options people are given in fixed choice questions, by whether or not there is a middle category in a rating, by whether or not there is a ‘don’t know’ filter, or by the ordering of the questions, the ordering of the responses or their position on the page. All the researcher can do is to minimize the likelihood of errors arising from poor questionnaire design through design improvements.
A source of error in virtually all survey research is non-response. It is seldom that all individuals who are selected as potential respondents are successfully contacted, and it is seldom that all those contacted agree to co-operate. Non-contacts are unlikely to be representative of the total population of cases. Married women with young children, for example, are more likely to be at home during the day on weekdays than are men, married women without children, or single women. The probability of finding somebody at home is also greater for low-income families and for rural families. Call-backs during the evening or at weekends may minimize this source of bias, but it will never be eliminated.
The contact rate takes the number of eligible cases contacted as a proportion of the total number of eligible cases approached. Interviewers may be compared or monitored in terms of their contact rates. Potential respondents who have been contacted may still refuse co-operation for a whole variety of reasons including inconvenience, the subject matter, fear of a sales pitch or negative reaction to the interviewer. The refusal rate generally takes the number of refusals as a proportion of the number of eligible cases contacted. Once again, refusals are unlikely to be representative, for example they may be higher among women, non-whites, the less educated, the less well off and the elderly. The detection of refusal bias usually relies on checking differences between those who agreed to the initial contact and those who agreed only after later follow-ups on the assumption that these are likely to be more representative of refusals.
Most researchers report a response rate for their study and this will normally combine the ideas of a contact rate and a refusal rate. However, in terms of its actual calculation a bewildering array of alternatives is possible. Normally, it is the number of completed questionnaires divided by the number of individuals approached. Sometimes the number found to be ineligible is excluded from the latter. Yet others will argue that the same applies to non-contacts, terminations and rejects. The result will be dramatically different calculations of the response rate. Whichever of these is reported, however, what is important as far as error in data construction is concerned is the extent to which those not responding – for whatever reason – are in any way systematically different from those who successfully completed. Whether or not this is likely to be the case will depend substantially on whether or not there were call-backs and at what times of the day and days of the week individuals were approached.
Apart from non-contacts, refusals and those found to be ineligible, there will, in addition, usually be item non-response where individuals agree to participate in the survey but refuse to answer certain questions. A refusal to answer is not always easy to distinguish from a ‘don’t know’, but both need to be distinguished from items that are not responded to because they have been routed out as inappropriate for that respondent. All, however, are instances of ‘missing values’, which are considered in Chapter 2.
Researchers faced with unacceptable non-response rates have a number of options:
simply report the response rate as part of the findings;
try to reduce the number of non-respondents;
allow substitution;
assess the impact of non-response;
compensate for the problem.
Many researchers choose to report survey results based only on data derived from those responding and simply report the response rate as part of the results. This shows that the researcher is unaware of the implications of non-response, believes them to be negligible or has chosen to ignore them. Non-response may not itself be a problem unless the researcher ends up with too few cases to analyse. What is important is whether those not responding are in any significant ways different from those who do.
The number of non-respondents can usually be reduced through improvements in the data collection strategy. This might entail increasing the number of call-backs, using more skilled interviewers or offering some incentive to potential respondents. The effort to increase the rate of return becomes more difficult, however, as the rate of return improves and costs will rise considerably. Allowing substitution can sometimes be a sensible strategy in a sample survey provided the substitutes are selected in the same way as the original sample. This will not reduce bias from non-response, but it is a useful means of maintaining the intended or needed sample size. For censuses, substitution is, of course, not an option.
Assessing the impact of non-response implies an analysis of response rates, contact rates, and so on, plus an investigation of potential differences between respondents and non-respondents, and some model of how these relate to total survey error. There are various ways of checking for non-response bias. Researchers sometimes take late returns in a postal survey as an indication of the kind of people who are non-responders. These are then checked against earlier returns. In an interview survey, supervisors may be sent to refusers to try to obtain some basic information. Interviewers can also be sent to non-responders in a postal survey. Another technique is to compare the demographic characteristics of the sample (age,