Plan your data

As you move into a new project it is important to consider the data that you will create, gather and use in the course of the project and make decisions about how you will manage your data.

What is Research Data Management

Data can have a longer lifespan than that of the research project that creates or collects it. You may continue to work on your data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers. So making sure you are properly managing your data through the whole lifecycle of the data is increasingly relevant.

Many funders are now asking you to do this as part of their application process. Considering options for data management at an early stage can help you make the right decisions at the right time about creating, storing and sharing your data.

Make sure you know about your funders' expectations.

Data planning

Data planning involves making decisions at the outset of your research to decide:

Which software and file formats to use
How to organise, store and manage your data
What to include in the consent agreements you negotiate

These will all affect what it’s possible to do with your data in the future. Data planning is best done by writing a Data Management Plan.

Research data

Research data is the information that you are using to answer your research questions. Research data is often arranged, formatted or designed in such a way as to facilitate communication, interpretation and further processing.

The Digital Curation Centre defines research data as "a reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing."

Types of research data

Much research data is created ‘new’ for a specific project as it is answering a novel question but it may also be research data from a previous project that has been transformed, adjusted or reinterpreted to fit the needs of the new project. Five data types commonly used are:

Observational: data captured in real time that is usually unique and irreplaceable. For example, remote sensing data, survey data, field recordings, sample data
Experimental: data captured from lab equipment that is often reproducible. For example, gene sequences, chromatograms, magnetic field data
Models or simulation: data generated from test models where model and metadata may be more important than output data from the model. For example, climate models, economic models
Derived or compiled: resulting from processing or combining ‘raw’ data. For example, text and data mining, compiled databases, 3D models
Reference or canonical: a static or organic conglomeration or collection of datasets, probably published and curated. For example, gene sequence databanks, collection of letters or archive of historical images

Examples of research data

Research data can be electronic or in hardcopy (e.g. paper) and it may include the following:

Documents (text, Word, PDF), spreadsheets
Laboratory notebooks, field notebooks, diaries
Questionnaire responses, transcripts, codebooks
Audiotapes, videotapes, photographs, films
Slides, artefacts, specimens, samples
Collection of digital objects acquired and generated during the process of research (including digitised archive material)
Database contents (video, audio, text, images)
Models, algorithms, scripts
Contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
Methodologies, workflows and protocols

The UK Data Council offers detailed information about research data management practices across academic disciplines.