threats raised by the proliferation of social data. There is an emerging consensus that a new ethical framework for the conduct of social research is necessary in order to protect citizens from harm but, as yet, there is little agreement on what changes it should embody, and how it should be promulgated and enforced.
In this chapter, Jirotka and Anderson examine the ethical issues raised by e-Research methods and what steps the social research community might take to address them. They use three case studies to illustrate the issues. The first describes a flagship UK e-Science project eDiaMoND and the process of gaining ethical approval for its work. The second concerns a recent controversy regarding social science researchers’ use of Facebook data called the ‘Harvard Meltdown’. The final case study is about developing prototype assistive technology for vulnerable people. Jirotka and Anderson draw several conclusions from these studies: managing ethics in large scale, multi-disciplinary research projects is particularly difficult and some of the founding principles of research ethics, such as informed consent, can be burdensome; protecting the identity of sources using conventional techniques for anonymization is becoming progressively less reliable as more and more information about subjects and settings becomes openly available via the Web (identification is always possible given enough correlated data); consenting to take part in research must be done in a principled way and, having consented, participants must have the power in practice – and not just in principle – to withdraw it; and finally, where a project involves interventions in people’s lives, researchers must consider what may happen once the project finishes.
They conclude with a discussion of the ethics of big social data. They underline the importance of the well-rehearsed arguments about threats to privacy and confidentiality. They ask what rules should apply to the use of social media in research: does publishing thoughts and opinions in public render informed consent irrelevant? However, their key insight goes further: it questions whether the lure of big social data is persuading researchers to relax their professional judgment about what conclusions are warrantable from the data. Jirotka and Anderson’s fundamental argument is that we need to bring ethical considerations into the heart of how we conduct research, from the point where decisions are being made about research goals, through to the collection and analysis of the data and the making sense of the findings.
Chapter 13: Sociology and the Digital Challenge
This final chapter examines the implications of massively increased computational and data resources for social research methods, including the impact on its established practices and future of its disciplines. In it, Savage returns to themes that he and his co-author, Burrows, first raised in their subsequently much-cited paper, ‘On the coming crisis of empirical sociology’ (Savage and Burrows, 2007). His aim, in part, is to ground expectations of the changes in social research that may follow from digital innovations and, not least, to question their inevitability. As the contributions of the authors of the chapters in this volume convincingly demonstrate, the future of digital sociology is contested: they all agree that the discipline is undergoing a sustained period of innovation, but its future direction is unknown. Together, they make a powerful case for Savage’s assertion that the future of digital sociology is not a given, but lies in the hands of current and subsequent generations of practitioners.
1.4 Future directions
1.4.1 Technical Developments
The other chapters in this book, described above, confirm that e-Research has moved on from an early focus on grid computing to encompass a very diverse set of tools, some of which are enhancements of previous software and others that are entirely new. A factor that suggests that this diversity will persist and even grow is the lack of central co-ordination and oversight. In the UK, the national e-Science Centre, which was the hub for the core programme, ceased operating in 2011, as did the NCeSS Hub in 2010. Other national centres still exist, for example the New Zealand eScience Infrastructure (www.nesi.org.nz), as do several international initiatives, such as the Open Grid Forum (www.ogf.org) and the European Grid Infrastructure (www.egi.eu). The emphases of these centres and programmes, however, are largely high performance computing, providing cloud services and codifying grid standards; areas of limited relevance to the social sciences. Outside these programmes, technical developments are either mostly modest refinements to existing tools, updates to commercial packages driven by competition for market share, or the adoption and adaptation of whatever generic or specialized tools and services researchers find can smooth the path of their own research. The future path of technical developments is therefore impossible to predict, though the drive to harness computing power to enable better research is unlikely to abate.
1.4.2 The Data Deluge
As reiterated in most of the chapters in this volume, we live in an information age characterized by a deluge of digital data (Hey and Trefethen, 2004; Hey, Tansley and Tolle, 2009). The chapters set out many of the potential research benefits to be obtained by collecting and analysing artificially produced and naturally occurring big data of many kinds from numerous sources. However, these benefits will only be realized if the wealth of data is managed in ways that ensure that it is discoverable, accessible, usable and re-usable. Indeed, research data management was a cornerstone of the original e-Research vision.
Accordingly, national e-Research programmes to innovate research methods, tools and infrastructure have devoted significant efforts to raise awareness among stakeholders that research data is a vital resource whose value needs to be preserved for future research by the data originators and by others. Achieving this requires that the data be systematically organized, securely stored, fully described, easily locatable, accessible on appropriate authority, shareable, archived and curated. Fulfilling all of these research data management tasks is a complex socio-technical challenge that stakeholders, whether they are research funders, higher education institutions (HEIs), publishers, researchers or regulators, are currently ill prepared to meet (Procter, Halfpenny and Voss, 2012). There are, as yet, no widely-agreed, mature solutions that can be implemented across all the various platforms that researchers use. Moreover, given the combination of the data deluge and a world recession, the scale of the tasks is increasing while the financial and therefore human resources to undertake the tasks are shrinking.
Ensuring the implementation and sustainability of data preservation will need to take on board the prospect of research becoming more collaborative and research teams being more widely distributed, as signalled in the e-Research vision of researchers world-wide addressing key challenges in new ways. The implications for data management services are summarized in a report from the Department for Business, Innovation and Skills (BIS) in the UK, which concluded, ‘A federated infrastructure will be essential to exploit existing and future investments [in data] effectively’ (Business, Innovation and Skills, 2010, 9). If such a federated infrastructure is to be achievable, then establishing effective inter-institutional service models will take on increasing importance. HEIs and other research organizations will need to develop strategies and infrastructure solutions that enable the federation of individual data repositories and the virtualization of data services. This will add a further layer of sustainability issues, the opportunities, costs and benefits of such collaborations will need to be carefully examined, and HEIs (both large and small) will need to develop competencies in managing services that span administrative and funding boundaries. In the current competitive environment, with universities locked in a zero-sum struggle for resources, there is little incentive to put effort into the inter-institutional cooperation required.
The term big social data serves to draw attention to three salient dimensions that define new forms of social data: volume, variety and velocity, the last reflecting its often real-time and rapidly changing character. Developments linked to the emergence of big social data are happening continually and we cannot be certain what impact such data will have on research processes. It is possible that it will promote the use of new computational social science methods in place of more traditional quantitative and qualitative research methods. It might also influence thinking and re-orientate social research around new objects,