never used the Internet and this is particularly evident amongst those aged 65 years and over and for those on lower incomes (ONS, 2013; Ofcom, 2010). Only a proportion of Internet users are regular Twitter users (there are 15 million Twitter users in the UK (see Curtis, 2013; Wang, 2013) and hence tweets must be used with great care if bias is to be avoided. Conversely, for a study of young people’s attitudes towards drug use, Twitter postings might provide a useful resource for framing a more conventional study involving interviews or a questionnaire survey. Looking forward, it is reported that nearly half of teenagers in the UK have a smart phone and this figure continues to increase (Ofcom, 2011), highlighting that social media usage might become more prevalent in the future.
In this context, orthodox forms of social science data will continue to be important for research questions that rely on statistical inference, for in-depth studies requiring intensive qualitative techniques, and for topics and populations that new types of data and data generation processes do not cover (see Chapter 4). It is important to understand that, even in the age of data, there are still gaps in the evidence base and there is still a need for purpose-specific data and bespoke research design including for hard-to-reach groups. It is also notable that in a recent consultation with over 300 social science researchers in the UK, nearly three quarters thought that methods such as surveys would not be used any less in the future (Elliot et al., 2013).
2.4.3 Data Ownership, Consent and Access
Taking a step back from the core methodological issues, the question of data ownership is another pressing challenge. There is a lack of clarity in the ownership and regulation of the use of different types of social media data. Facebook and Twitter postings occur in public but only certain aspects of the resultant data are available to the public.
Twitter claims that tweets are owned by the people who write them, but then treats them collectively as a saleable commodity. There is a process of consent as part of the process of creating a Twitter account. The account holder is prompted that:
You are responsible for your use of the Services, for any Content you post to the Services, and for any consequences thereof. The Content you submit, post, or display will be able to be viewed by other users of the Services and through third party services and websites (go to the account settings page to control who sees your Content). You should only provide Content that you are comfortable sharing with others under these Terms… You understand that through your use of the Services you consent to the collection and use (as set forth in the Privacy Policy) of this information, including the transfer of this information to the United States and/or other countries for storage, processing and use by Twitter… By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed). (Twitter, 2013)
Despite these terms of service, it may not be entirely clear to the account holder how their tweets might be used for secondary purposes, including social science research. There is only limited research on how such terms of use compare with other data types and forms of data collection, such as intentional data collected via a survey or the UK Census.
The importance of participant consent and data use protocols are well established in existing social science methods and integral to ethical approval processes (Bryman, 2013). In relation to new types of data, the individuals who, for example, social media data are about may not be aware that the data exist or that the data is public and what this means in practice. Even if a person is aware that the data exist, they may not realize that they are being used for secondary purposes (social research or otherwise) and have a commercial value.
Similarly, volunteer and crowdsourced data from observing events may include information on other people taking part in the event. As Gross (2011) argues, the existing frameworks of ethics and particularly informed consent are limited and they need to be overhauled if they are to cope with the scale, intensity and immediacy of the constantly evolving data environment. Such issues are crucial for the ethical development of social science research using new types of data, such as conditional and trace data, and for the effective regulation of uses of the data as the relationships between citizens, state and the commercial sector change (see Chapter 12 for a full discussion). For a discussion of what is termed agile ethics, see Neuhaus and Webmoor (2011) and AOIR (2015) but much more work is needed in this area.
In terms of data access, social media data is held by commercial companies that may choose to sell the data, but are not required to make it freely available for social science research. As Savage and Burrows (2007) have argued, the commercialization of sociology, where the driver is the economic value, could pose a threat to social science research (see also Chapter 13). Specifically, verification, replication and review become much more problematic, as has been found in medical research. To ameliorate these problems, the case needs to be made – probably through government – for regulatory processes guaranteeing researchers’ access to new types of data, their provenance, value, validity and reliability, and the analyses and claims made of them. Of course, a key feature is that the data are generated by citizens, and it can be argued that citizens should have some say in their secondary use.
It may well be that we need to move from a milieu of regulated data protection to one of policing data abuse. Mandatory social science research access by approved researchers would be one mechanism for enacting such a regime. This would not necessarily jeopardize the commercial value of the data to businesses but could be part of a legal and ethical responsibility to the customer and their welfare. In a data abuse framework, one is less concerned about the control of data flows and processes and more with the consequences and specifically harms caused by the actions and choices of individual data processors.
There is an indication of a mobilization in this area, for example, in the work of the Web Science Trust,73 which is focused on sharing expertise and resources to enable research and understanding of the Web. We consider these developments in more detail in Chapter 3.
2.4.4 Competing Narratives
The new types of data reflect the new reality of the information society and that we are all living interlinked real and virtual lives. The emergence of new data types provides opportunities for new insights and new understandings of what might otherwise be intractable social problems. This can be seen as an opportunity but also a challenge.
Social scientists may face increasing competition to have their findings heard. For example, findings from surveys of public attitudes on particular issues are now competing with the reports of Twitter or blog postings. Each data type has its strengths and weaknesses. However, analysis based on social media data may be produced more rapidly and secure higher profile media coverage than more conventional social science research (with its often lengthy process of design, data collection, preparation, analysis and peer review).
Arguably, social scientists have always been in competition with alternative sources of information about, and explanations of, social phenomena. This includes journalists, and politicians as well as government department reports. A concern here is the robustness of the evidence base that is being used to inform the claims: it can be based on biased samples, small case studies or personal anecdotes. This might reflect the difference between what a representative survey suggests about what a particular population thinks and ‘what a taxi driver thinks’ (not an uncommon comment by policy makers) or even what people say to a taxi driver.74
The new aspect of this is the scale of the ‘what the taxi driver thinks’ data now that this can be promulgated through millions of Twitter postings. Where access to social media data is