Raymond A Kent

Analysing Quantitative Data


Скачать книгу

wants to apply a statistic that requires two ranked variables. Metric variables can be ranked in SPSS using the procedure Transform|Rank Cases.

      Another example of downgrading is when a researcher wishes to crosstabulate a nominal with an ordered category variable. An appropriate coefficient may be chosen that treats both variables as nominal, thereby ignoring the ordering of the categories in one of the variables. A more extreme example is when a researcher takes a continuous metric variable like age and groups respondents into a binary measure of ‘old’ and ‘young’ or into an ordered category measure of ‘old’, ‘middle aged’ and ‘young’. This may be undertaken if the researcher wishes to crosstabulate age with another binary, nominal or ordered category variable, for example ‘purchased’ and ‘did not purchase’ a newspaper in the last seven days. The age split would normally be done in a way that creates two (or three or more as required) roughly equal groups. The SPSS procedure Transform|Compute can be used to create a new variable grouped in this way.

      Handling missing values and ‘Don’t know’ responses

      In any survey, not all respondents will answer all the questions. This is less likely to be the result of individual refusal to answer some of the questions (although this does happen), or people accidentally omitting to consider some of the questions, than a result of questionnaire design whereby not all the questions are relevant to all the respondents. The result is that values will be missing from some of the cells in the data matrix.

      Where a question would be appropriate to a given respondent, but an answer is not recorded, then such missing values may be referred to as ‘item non-response’. Most researchers are inclined just to accept that there will be item non-response for some of the variables and will simply exclude them from the analysis. This is fine when the number of cases entered into the data matrix is large or at least sufficient for the kinds of analyses that are required. However, there is always the danger that this approach may reduce the number of cases used in a particular analysis to such an extent that meaningful analysis is not possible. There is, however, a bewildering array of techniques that have been suggested in the literature for ways of dealing with this situation. Most of these involve filling the gaps caused by missing values by finding an actual replacement value. The process is sometimes called ‘explicit imputation’ and the idea is to select a replacement value that is as similar as possible to the missing value. Where variables are metric, one remedial technique, for example, is to substitute the mean value for the missing value. For categorical variables one technique that is sometimes used is to give the questionnaire with the missing value the same value as the questionnaire immediately preceding it.

      Most of the techniques assume, however, that question items not responded to are done so at random. This can be quite difficult to determine. Furthermore, when the amount of item non-response is small – less than about 5 per cent – then applying any of the methods is unlikely to make any significant difference to the interpretation of the data. Ideally, of course, researchers should, in reporting their findings, communicate the nature and amount of item non-response in the dataset and describe the procedures used to remedy or cope with it. How missing values are handled in SPSS is explained in Box 2.6.

      It would be a sensible policy to reserve system missing values in effect for questions that are not applicable to the respondent in question and to give a special code for those where responses are missing for other reasons. The combination of system-defined and user-defined missing values can mean that, for some tables or calculations, the number of cases used is considerably less than the number of cases entered into the data matrix. Furthermore, it will mean that the number of cases included will vary from table to table or statistical analysis. If the number of cases in the data matrix is quite small to begin with, this can have serious implications for the analysis.

      ‘Don’t know’ answers are one type of non-committal reply that a respondent may give along with undecided or neutral responses in a balanced rating with a middle point. These responses may be built into the design of the questionnaire with explicit options for a non-committal response. In Figure 2.9, there are, for example, separate categories for ‘Neither important nor unimportant’ and for ‘Don’t know’: 285 or over 30 per cent gave a neutral response for the importance of choosing well-known brands for chocolate or sweets, and 154 (over 16 per cent) indicated ‘Don’t know’ for cigarettes. In some questionnaires the ‘Don’t know’ answers are included in the ‘Neither’ category.

      An understanding of the pattern of such replies is important for formulating research methodology, particularly questionnaire design, item phrasing or the sampling plan, or for interpreting the results when there are many ‘Don’t know’ responses. Non-committal replies have been interpreted very differently by researchers. These interpretations fit into two broad patterns:

       ‘Don’t know’ responses are a valid indicator of the absence of attitudes, beliefs, opinions or knowledge.

       ‘Don’t know’ replies are inaccurate reflections of existing cognitive states.

      The first interpretation provides a rationale for including explicit non-committal response categories in the questionnaire. It also implies that such responses should be excluded from the analysis (treating them as user-defined missing values in SPSS, see Box 2.6), even if this means that the number of cases on which the analysis is based is thereby reduced. If there are a lot of respondents in this category, then it is possible that the question to which people are being asked to respond is not well thought through and there may well be an argument for excluding the question from the analysis altogether.

      The second interpretation has been used to set in motion various efforts to minimize ‘Don’t know’ responses on the basis that only committed responses will reflect a respondent’s true mental state. Such efforts will include providing ordered category measures that have no middle position or non-committal option, or that have interviewers probe each non-committal reply until a committal response has been obtained.

      If there are relatively few non-committal responses then leaving them out of the analysis may well be the best course of action to take, particularly if the number of remaining cases is still adequate for the statistical analyses being proposed. There will certainly be a case, however, for including them in any preliminary analysis of the variables. A decision can then be taken about whether they are to be excluded from subsequent analyses.

      Survey research findings are certainly not invariant to decisions about what to do with non-committal responses. Treating such responses as randomly distributed missing data points when in fact some responses are a genuine result of ambivalence or uncertainty may introduce bias into the data. The same would be true if responses are included as neutral positions when they are in fact an indicator of no opinion or refusal to answer. A first step in any analysis would be to investigate the extent to which non-committal responses are a function of demographic, behavioural or other cognitive variables. Some studies, for example, have reported an inverse relation between education and non-committal responses (the better educated are less prone to give them), but it has to be said that other research has found exactly the reverse. Durand and Lambert (1988) found that such responses vary systematically with socio-demographic characteristics and with involvement with the topic area.

      Box 2.6 Missing values in SPSS

      SPSS makes a distinction between two kinds of missing value: system missing values and user-defined missing values. The former result when the person entering the data has no value to enter for a particular variable (for whatever reason) for a particular case. In this situation the data analyst will just skip the cell and SPSS will enter a full stop in that cell to indicate that no value has been recorded. For most non-graphical outputs, SPSS will list in a separate Case Processing Summary the number of valid and the number of missing cases. In some tables, as in Figure 2.5 in an earlier section, valid and missing cases are shown in the printed output table itself. Percentages are then calculated both for the total number of cases entered into the data matrix and for the total of non-missing cases for that variable – what SPSS calls the Valid Percent.