The Ethics of Big Data

This article, by Angela Daly from EFA’s Policy & Research Standing Committee, was originally published on the St James Ethics Centre website.

By Camelia.boban [CC-BY-SA-3.0], via Wikimedia Commons
By Camelia.boban [CC-BY-SA-3.0], via Wikimedia Commons
Big Data is one of the current much-hyped and much-talked-about technology trends, along with the Internet of Things, wearable devices and 3D printing. But what precisely is this ‘big data’? The term is used in different ways by different people, but it can be taken to mean any collection of very large and complex datasets which would be difficult to process and analyse using traditional methods, although it is also increasingly applied to just any very large amount of data – which might be controlled by the State via Centrelink, e-health services, tax and so on, or via large information corporations such as Google and Apple, or even ‘old’ industries such as banks and supermarkets. Big data has also got politicians all excited, with European Commission Vice President Neelie Kroes urging governments to ‘embrace big data’.

There are some obvious positive aspects to Big Data – certain analyses that would not previously have been possible due to technical limitations as well as restrictions of scope and scale can now be performed, and this can reveal certain new information about ourselves and the world we live in. Indeed, it can reveal new information beyond ourselves and our world too – my colleagues in the Swinburne Centre for Astrophysics and Supercomputing would not be able to do their job without the accumulation of Big Data about our universe. Aside from possibly some issues about ownership and intellectual property, this kind of Big Data gathering does not pose too many ethical questions. However, when Big Data gets personal – collecting and analysing information about human beings, or data made by them – that’s when the problems start.

A good (or bad) example of some of the ethical questions that are raised when there is Big Data about and by people is the recent Facebook ‘emotional contagion’ study, which involved an in-house Facebook researcher and some academics from Cornell University in the US. During one week in January 2012, over 600,000 Facebook users unwittingly had their News Feeds manipulated to include either ‘positive’ or ‘negative’ stories in order to determine whether their exposure lead to similar expressions. The lack of informed consent from the participants specifically for this experiment proved highly controversial once the research was made public, and data protection authorities in the UK and Ireland are investigating the extent to which the study complies with EU data protection law, and a complaint has been made to the US Federal Trade Commission that this research may have been conducted illegally.

The Facebook experiment shows one of the ‘dark sides’ of Big Data – the use of people’s information without their consent or control. While it will be interesting to see the results of any investigation into the legality of what Facebook did, it is true that what privacy laws there are in place dealing with personal data are not particularly strong, not well-adapted for changes in technology and the proliferation of data and not always well-enforced. Another problem is posed by the fact a lot of data may be stored in the ‘cloud’, that is to say in a location which is not the equipment of the person giving or receiving the data, a location somewhere ‘out there’, including somewhere overseas. This brings the implication that foreign laws might be governing the data (which might provide even less protection of privacy than Australian law for example), foreign law enforcement agencies might be able to access that data (leading to surveillance concerns), and it can be difficult if not impossible to ensure that the data is being stored securely. Although this applies to any data stored in the cloud, the ‘bigness’ of the data just intensifies the problem.

Big Data involving accumulations of personal information – or ‘profiling’ – can also build very detailed and intrusive pictures about private individuals. Indeed, the information does not necessarily have to be ‘personal’ to be revealing. A study by a Stanford graduate on telephone ‘metadata’ (such as the phone numbers the user called and the numbers of received calls) showed that this information could reveal a person’s political and religious affiliation among other intimate details about that person’s life. This is a particularly important finding for Australians given the government’s current plans to introduce the mandatory retention of all communications metadata.

Further ethical questions arise regarding the uses of Big Data and the conclusions drawn from it. Kate Crawford has warned of ‘data fundamentalism’ – ‘the notion that correlation always indicates causation, and that massive data sets and predictive analytics always reflect objective truth’. Given there is an element of human design behind the gathering and processing of the data, there can accordingly be hidden biases in the data, and so Big Data might be best used alongside traditional qualitative methods rather than instead of them. However, if technodystopian Evgeny Morozov is to be believed, then we are moving towards the opposite situation in practice: ‘smart’ devices and Big Data are aiding policy interventions in the US, making some initial steps towards ‘algorithmic regulation’ by which social objectives are achieved through data and technology. Aside from the problems of bias in the data and it not giving a full picture of what is happening in reality, in practice ‘algorithmic regulation’ is unlikely to address the causes of social problems, and instead deal with their effects – so inequalities are likely to persist.

As ever with new technologies, Big Data is neither good nor bad nor neutral. Design, implementation and use will determine whether Big Data is ethical or not, though its limitations in revealing more about our society should be borne in mind when using the data to make sweeping statements about the state of the world. While it seems to be a useful tool for research, it’s worth cutting through the hype to realise it’s not the only one, and the old ways can still be good ways.