Skip to main content

Organising research data

Organising your data is the most basic of all of the research data management functions. With very little planning or effort you can make your data files easy to store, find and use. Failing to organise your data can make it unusable even by the people who created the data!

Data storage

The best place to store, secure and ensure the long term life of your research data while you gather and use it is in one of the quality storage options available from IT Services. These options are designed to be flexible and fit a range of use cases and needs, get in touch with the storage team if you want to explore these options further.

Data formats

When you’re planning your research project it is essential that you consider the file formats you will choose to store your data. Your choice of file format will affect the usability and long term accessibility of your files and data. As technology changes, you should also plan for both hardware and software obsolescence.

File formats more likely to be accessible in the future have the following characteristics:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Examples of preferred file format choices include:

  • ODF, RTF or TXT, not Word (.doc or.docx)
  • ASCII, not Excel (.xls or .xlsx)
  • MPEG-4, not Quicktime
  • TIFF, PNG or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

If you are using proprietary software consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.

  • The University of Cornell has a great guide on common image formats and when to use which
  • The UK Data Service also has a guide to recommended and acceptable file formats

File naming and folders

How many times have you looked for a document and then found that you can’t remember which folder you stored it in? Imagine if you needed to find a file in the files of a research partner, where would you start? Starting a project with a strategy for the consistent naming of both files and folders can help research data avoid becoming disorganised. Creating appropriate file and folder structures will save time, avoid loss of data, allow re-use of the data, and assist in accurate location of data in the future.

To a certain extent it doesn’t matter what system you choose to use as long as everyone creating data for the project agrees on the system and you are all consistent in using it! Consider also if you will need to include version information in the file name.

  • Jisc Digital Media has a guide on choosing file names
  • Guide to renaming files and file extensions from Geek Girl's Plain English Computing

Documentation and metadata

Good documentation for your data is like creating a ‘user’s guide’ to the data and helps make data understandable, verifiable and reusable. Just making the data available does not make it useful, if you or others come back to your data at a later time they will need information on when, why and by whom the data was created, what methods were used and an explanation of any acronyms or jargon used.

Research funders demand that researchers make, at the very least, the metadata about their data openly available to facilitate the location and reuse of datasets. Documentation and metadata about a dataset is often mentioned together but can be very different things:

Metadata

This is more structured data about the dataset and will include the following key pieces of information:

Metadata field

Description

Title

A name or title by which a resource is known

Unique resource identifier

For your working data this could be a project ID or a departmental identifier. Once you publish your data the unique resource identifier will be a persistent URL or DOI (Digital Object Identifier) depending on where you publish your data

Description

Description of the data set, like an abstract for a paper

Subject

Subject or classification code describing the resource chosen from one or more authoritative sources

Creator(s)

The main researchers involved in producing the data in priority order

Funder

Sources of financial support for the development of the resource, e.g. ESRC or Wellcome Trust

Resource Language

Default will be set to 'eng' (english)

Publication date

The date when the data was or will be made publicly available

Publisher

The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. For your working data this will be the University of Warwick

Contact email address

Person or service with knowledge of how to access, troubleshoot, or otherwise field issues or correspondence related to the data set

Taken from the DataCite Metadata schema.

  • Jisc Digital Media have a list of subject specific controlled subject vocabularies which may be useful in describing you data

Documentation

Documentation can be considered to be a more detailed equivalent of a ‘read me’ file for your data. Like the methodology for your publications this will include the following information and more:

  • What hardware and software were used to create the data?
  • What methodologies were used to create the data?
  • What assumptions were made in your experiments?
  • Why are there anomalies in your data?

Much of what you should include here will be found in project level documentation is likely to have already been included in the project application. Documentation content, such as the aims and objectives of the project, any hypotheses, the methodologies used in the project, can be created even before the project has begun and so replicating them for the publication with the dataset need not be very time consuming.

Backup and security

It is essential during your project that you have plans in place to ensure the safe storage of your data as well as a strategy for regular backups.

The level of risk and thus the level of care you should take with your data will in part depend on the ‘classification’ of the data. The University’s Information Security team have resources and training available about the classification of data and what actions you should take depending on the classification you’ve agreed on. This advice includes information on encryption software available from IT Services if this is necessary for your project.

If you are storing your data in the University’s storage options then they are automatically included in the main IT Services backup processes so can be an easy way to cover all your backup requirements.

Top tip! Do test your backups to make sure they open as you expect hem too!

Password managers

A password manager is a tool that remembers your passwords for you and in some cases can create more secure passwords for you to use. The idea being that you only need to remember the password for the password manager and then you can copy and paste all of the rest from the manager.

A couple of examples:

  • KeePass is open source, lightweight and quite flexible but doesn’t have a stable release for Mac users yet
  • LastPass is commercial software with a free version and a premium version and works for Macs as well as PCs. The premium version has a mobile option as well
representation of data