at the University of Stirling for permission to use the alcohol marketing dataset, and John Hall, now retired, formerly of the Social Science Research Council Survey Unit, who has given permission to use the datasets on his very helpful website.
1 Data structure
Learning objectives
In this chapter you will learn that:
data are constructed rather than collected and result from a process of systematic record-keeping;
records are created in a social, economic and political context and for purposes specific to individuals or groups within organizations;
qualitative data consist of words, phrases, narrative, text and visual images, while quantitative data arise as numbers that result from the systematic capture of classified, ordered, ranked, counted or calibrated characteristics of a specified set of cases;
all quantitative data have a structure that consists of cases, properties and values;
the construction of data of any kind is likely to give rise to errors from various sources;
the dataset used throughout this text consists of 61 properties for 920 cases, but they do not constitute a random sample and there are many potential sources of error in the data.
Introduction
All research involves analysing data at some point – but what do we mean by ‘data’? What kinds of data are there? How are they constructed? How are quantitative data structured? This chapter provides an overview of the nature and characteristics of data in general and shows how quantitative data in particular are constructed and structured.
The procedures used by researchers to structure and analyse a dataset are illustrated throughout this text with a study carried out by the Institute of Social Marketing at the University of Stirling, which studies the impact of alcohol marketing on the drinking behaviour of young people aged between 12 and 14. The findings are based on a survey that involves an interview-administered questionnaire measuring awareness and involvement with alcohol marketing and a self-completion questionnaire measuring alcohol drinking and associated behaviours. The homes of all second-year pupils attending schools in three local authority areas in the west of Scotland were contacted, generating a sample of 920 respondents (Gordon et al., 2010a).
The key research hypotheses are that the more aware of and involved in alcohol marketing that young people are, the more likely they are to have consumed alcohol, and the more likely they are to think that they will drink alcohol in the next year. To measure awareness, respondents are asked if they have seen any adverts for alcohol in any of 15 channels, for example television, cinema, newspapers, websites or sponsorships. Responses are recorded into ‘Yes’, ‘No’ and ‘Don’t know’. To measure involvement in alcohol marketing, pupils are asked whether they have, for example, received free samples of alcohol products, free gifts showing alcohol brand logos or promotional mail or email.
Drinking behaviour is measured in four main ways. Drinking status is assessed by asking whether pupils have ever had a proper alcoholic drink, not just a sip. Future drinking intention is assessed by asking about the likelihood that they would drink alcohol during the next year – ‘Definitely not’, ‘Probably not’, ‘Probably yes’ and ‘Definitely yes’. They are also given a ‘Not sure’ option. Initiation is measured by asking how old they were when they took their first drink, plus a measure of the number of alcoholic units last consumed. The study uses a range of control variables suggested in the literature, for example parental attitudes towards drinking and alcohol consumption, perceived parental drinking approval, sibling and peer drinking behaviour, liking of school and rating of school work. Demographic controls include gender, social grade (based on the occupation of the head of household), ethnicity and religion. The data from the alcohol marketing research are available online from the Sage website, https://study.sagepub.com/kent.
Data and their construction
Data are often thought of as ‘the facts’ – things that are known to be true. The dictionary tells us that the word is a plural noun (although commonly treated as singular) and derives from the Latin word that translates literally as ‘things given’. Data are thus portrayed as a form of knowledge – sheer, plain, unvarnished, untainted by social values or ideology and, for the most part, unchallengeable. The assumption is that they exist independently of our research activities and that we can simply go out and discover or ‘collect’ them like so many tadpoles in a pond.
In reality, however, data are not collected or discovered, but constructed. They are generated as a result of the human activity of systematic record-keeping, for example in registers of births, marriages and deaths, hospital records, invoices, questionnaires, electronic meters, audio or video recordings. Record-keepers, furthermore, construct data for their own purposes. They have their own agendas and personal circumstances; they have careers to pursue, their own fears and hopes; they have bosses to impress or subordinates to guide or deploy. Data construction is a process, furthermore, that takes place in a social, moral, economic, political and historical context. There are, for example, colleagues or academic peers to consider, respondents or subjects to bear in mind, consumers, clients, funding or sponsoring agencies to take into account.
All this is not to say that data are just concocted – meaningless artefacts, subject to manipulation, doctoring or media spin. They are, however, constructed in a particular context for specific purposes. It has been argued that everyday reality (Berger and Luckmann, 1966), scientific facts (Latour and Woolgar, 1979) and many other things like gender, homosexual culture or ideas about illness are socially constructed. By being specific about what is being socially constructed, there is the implicit admission that not everything (like material objects) is a social construction and that there may be degrees of construction involved (Hacking, 1999). Social reality does, however, both constrain and facilitate data construction; so do the agreed (or disputed) practices and routines of scientific procedure.
Few data, furthermore, are perfect. Errors, to varying degrees, will almost certainly be made in the data construction process. Different researchers will often produce different results, apparently from researching the ‘same’ phenomena. Even government statistics are often based on questionnaire surveys, and there are many things that can go wrong with this process. Issues of error in data construction are taken up later in this chapter. Apart from the absence or presence of error, the quality of data will also vary in their comprehensiveness, the speed or timeliness with which they are delivered, and in the manner of their construction.
Data, in short, are not ‘the facts’ or ‘things given’; they are social products. The records created are not reality itself; rather they are a result of researchers’ attempts to observe or measure traces or evidence of phenomena situated within complex systems (Byrne, 2002). The records that researchers create come in very different forms. The historian likes to think of church registers, diaries of famous people, or transcripts of what was said by politicians as ‘data’. A sociologist with an audio recorder studying women’s emotional reactions to domestic violence, or participating in ‘street corner society’ and making notes of his or her experiences, likes to think that he or she is collecting ‘data’. An anthropologist looking at some unusual, remote tribe of people considers that he or she is generating ‘data’ by making records describing their culture. The archaeologist uses physical traces or remains as evidence or data on past events, conditions or social behaviour. The manager of a business organization may think more in terms of sales data or information on balance sheets and profit and loss statements. The market researcher is more likely to see the results of a questionnaire survey or the record of a focus group discussion as ‘data’.
Data may, in fact, consist of three rather different kinds of constructed record, for example: