A SECURE FUTURE FOR RESEARCH DATA
Written by Yvonne Budden, E-Repositories Manager, University of Warwick Library
The curation, access and availability of data has always been a contentious topic, whether considering personal details or research study data. This is amplified by the data age that we live in, with all types of data being easily accessible for a large amount of people. But what does the future hold for research data? Will encouraging data sharing help security?
The collection, access and curation of research data is an increasingly urgent question, particularly as some research data documents unique events or circumstances that cannot be easily repeated. Historically, research data was able to be presented in the same journal article or book as the analysis of the data, however the kinds of datasets that are being produced by modern science are growing in both size and complexity. A modern journal article may only be able to feature a fraction of the data produced to support it, but it is often a condition of funding that researchers manage and retain their data for a set period of time after a project ends.
There are many reasons why properly curated and preserved data is advantageous to researchers and the world at large. Sharing research data can be instrumental to the advancement of science as this can allow other researchers to ‘ask new questions’ around the data. In this way, a single dataset can be used to answer more than one research question. Research in Astronomy and Climate Science can rely on synoptic data and sharing data allows scientists more freedom to combine datasets, either within a single discipline or across multiple related disciplines. In this way the results of many separate experiments can be combined to demonstrate broad or long term changes. There is also a growing trend of data use in disciplines not traditionally seen as data intensive.
New technology and techniques such as data mining have also expanded the range of questions that can be asked. Researchers may find that in sharing their data they gain access to others’ data in return. There are a range of other benefits too; sharing data can allow interested researchers to test the results of others. In 2004 the UK government signed up the OECD’s ‘Declaration on Access to Research Data from Public Funding’, a commitment to start to build the infrastructure necessary to facilitate access to data. This has proved to be advantageous to scientists in ‘crowd science’ projects such as Galaxy Zoo where results from the Sloan Digital Sky Survey were classified by members of the public, greatly accelerating the time taken to analyse the large datasets.
Despite these arguments there are also a number of legitimate concerns over the sharing, reuse and retention of research data. A great deal of work is often involved in producing the data and it is expected that the researchers should have a given period of exclusivity with which to disseminate their results. This concern is exacerbated by the conception that sharing data will lead to the data being “misinterpreted, misused or misappropriated without credit” 1. This is especially important for researchers who are not publicly funded, but funded by commercial companies. There are a number of legal issues surrounding the release of data, returning to the question of data ownership.
Concerns over data management and Freedom of Information requests cannot be ignored, especially in the wake of the ‘Climategate’ scandal.
Changes to copyright law can also impede the legal access to research data 2. Concerns over data management and Freedom of Information requests cannot be ignored, especially in the wake of the ‘Climategate’ scandal. Data security is an ongoing concern, both in terms of ethical restrictions and concerns about the privacy of the responders. The secure transfer of data between geographically separate groups of researchers needs to be considered. Any data curation service hoping to enter this field will have to be prepared to meet all of these concerns as well as others.
Data curation in the United Kingdom is mainly handled by individual researchers within institutions. These are supported by the UK’s network of Data Centres, including the UK Data Archive that handles social science datasets, the British Atmospheric Data Centre and others. The Digital Curation Centre’s mission is to support all parties concerned with the curation of data and has produced a number of resources to guide data management. The previous government, under Gordon Brown, considered that access to government data was a priority, which led to the inception of the data.gov.uk archive. This followed the US Government’s data.gov model. The British Library’s DataCite project aims to develop a DOI system for datasets so that they can be reliably referenced and cited, providing an extra incentive for researchers to share their data.
Curation and sharing of data is a developing issue that research intensive institutions need to begin to think seriously about. Most of the UK Research Councils have some form of policy on the management of data and an announcement is expected to be made in 2011 about further provisions that will be added. Any solution that is devised must take into account the position of all competing interests before deciding on a firm strategy.
1. Christine L. Borgman. 2010. "Research Data: Who will share what, with whom, when, and why?" China-North America Library Conference, Beijing. Accessed 18/02/2011 16:05, p. 9.
2. Ibid, p. 12.
Yvonne Budden is E-Repositories Manager at the University Library.
|