Ethical issues can arise from the collection and analysis of extremely large data-sets, some containing many petabytes of data, often from multiple sources.
These huge data-sets can contain information gleaned from all sorts of devices and services, for example:
- Internet use including web searches
- Online shopping and social media
- Mobile devices including geo-location data
- CCTV systems
In addition, data may be collected from devices that fall into the category of the ‘internet of things’. These include: smart utility meters; car engine management systems; and the control systems for office heating and ventilation systems. The use of data mining, data fusion, and predictive analytics on these huge data-sets has the potential to yield novel and useful information that has application in many different areas from effectively targeted online advertising through to efficient energy use and distribution of healthcare resources. It also has a clear, if more controversial, application in policing and security.
Although big data collection and analysis has the potential to deliver great benefits, it also raises a number of ethical problems – for instance:
Consent and autonomy:
Data is often collected without the express consent of the person who originated the data or its collection is authorised by a set of ‘terms and conditions’ that the originator is unlikely to have fully read or understood. That data may also be used for purposes that were not envisaged at the time of collection, possessed by others to whom the data may be transferred and stored for an unknown amount of time.
Privacy and surveillance:
Big data analysis has the ability to generate a great deal of personal information about individuals and this may be information that the individuals concerned regard as private. Moreover, whereas data is often collected anonymously or subsequently anonymised, data fusion techniques have the potential to re-identify the people connected with this data. These concerns about privacy are more acute when, for example, the agencies of the state are authorised to collect data of a more sensitive nature such as personal emails or email metadata.
Massive data-sets are very hard to manage and these data sets may contain very sensitive information. For that reason, data-sets make attractive targets for hackers and they have great potential for misuse.
Reliability, false positives and profiling:
The great promise of big data analysis lies in its ability to spot patterns and make novel predictions but the usefulness of these techniques is relative to the accuracy and reliability of the resulting predictions. Where big data is used to identify people for police or security investigations the possibility of false positives is highly significant and the costs of getting it wrong can be very serious. The use of big data analysis also has the potential to unjustly disadvantage individuals by virtue of their membership of a group that has been picked out as having statistically significant properties.
Videos of the speaker presentations given at the 'Big data - Social Data' event, held on 10th December 2015 by Warwick's Q-Step Centre, are available here.
Picture courtesy of: http://www.businessnewsdaily.com/images/i/000/004/493/original/bigdata.jpg?1380302987