Simon Lindgren

Data Theory


Скачать книгу

in anything from social science to biology, who works with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness, and the complexity and nature of the data, while simultaneously solving a real-world problem.

      Social scientists should ideally play an important role for data science as many problems that data science works with – friending, connections, linking, sharing, talking – are ‘social science-y problems’ (Schutt and O’Neil, 2013, p. 9). As put by new media theorist Lev Manovich (2012, p. 461):

      But even if we sometimes may have actual, real-life, well-motivated questions to pose to the data, data science notoriously runs the risk of becoming too data-driven. Indeed, data science is sometimes referred to as ‘data-driven science’ as its main aim actually is to extract knowledge from data. It is mostly not about testing hypotheses or theories in the traditional scholarly way. Instead, the work that is done with the data is driven by the data itself – in terms of the possibilities for gathering it, and the available tools for probing it.

      A related concept is data mining. As the word ‘mining’ hints, this approach is about working to discover interesting patterns in large amounts of data, for example from the internet and social media. This approach marks a break with the established view of the research process – at least within the more objectivist types of science – where a problem or research question is formulated beforehand. This problem, formulated following a particular need for a certain type of knowledge about a specific issue, then guides the researcher in sampling data, devising the research methods, and choosing the theoretical perspectives – or even in formulating strict hypotheses to verify or falsify. Such a process is by no means axiomatic when it comes to data science, which makes no secret about often being highly explorative, and going fishing with a very wide net. In many cases a so-called data piñata approach is employed. As defined by the online resource Urban Dictionary:

      data piñata: Big Data method that consists of whacking data with a stick and hopefully some insights will come out. [Example:] The Big Data Scientist made a Twitter data piñata and found that Saturdays are the weekdays with the most tweets linking to kitty pictures.

      (Urban Dictionary, 2018)

      Census and survey researcher Kingsley Purdam and his data scientist colleague Mark Elliot aptly point out that today, to a lesser and lesser degree, data is ‘something we have’, rather: ‘the reality and scale of the data transformation is that data is now something we are becoming immersed and embedded in’ (Purdam and Elliot, 2015, p. 26). Their notion of a data environment underlines that people today are at the same time generators of, but also generated by, this new environment. ‘Instead of people being researched’, Purdam and Elliot (2015, p. 26) write, ‘they are the research’. Their point is that new data types have emerged – and are constantly emerging – that demand new flexible approaches. Doing digital social research, therefore, often entails discovering and experimenting with challenges and possibilities of ever-new types and combinations of information. Among these are not only social media data, but also data traces that are left, often unknowingly, through digital encounters. Manovich gives an explanation that is so to the point that it is worth citing at length:

      (Manovich, 2012, pp. 461–3)

      Going back to 1978 and Glaser’s book on Theoretical Sensitivity, we can find some useful pointers on how to see the research process – beyond ‘quantitative’ and ‘qualitative’. The first step, for Glaser (1978, p. 3), is ‘to enter the research setting with as few predetermined ideas as possible’, to ‘remain open to what is actually happening’. The goal is then to alternate between having an open mind – working inductively, allowing an understanding of the research object to emerge gradually – and testing the emerging ideas as one goes along – working deductively trying to verify or falsify the developing interpretations. So, we can, quite mindlessly, beat on the piñata for a little while to see what jumps out. Then try to make sense of the things that emerged, and then beat some more to see what the new stuff that is popping out adds or removes from our present analysis.

      My point here is that being data-driven, as is often the case when working with big data, is not (only) a new ill, caused by the datafication of society and the fascination with huge datasets. Used in the right way, a data-driven approach – a data piñata – can be truly useful in getting to know more about what goes on, what social and cultural processes may be at work, in contexts and behaviours that are still largely unknown to us. From that perspective, not really knowing what we are looking for, and why, can be a means to tread new ground, veering off the well-trodden paths, to get lost to find our way. If we don’t even know what is going on, maybe beating that piñata with a stick isn’t such a bad idea? The new data science opportunities and tools, in combination with social theory has a huge potential to help decode the deeper meanings of society and sociality today.

      Finding