Organising data

During research, it is relevant to consider the structure of your data. We subsequently discuss filing and versioning and making backups.

Filing and versioning[1]

Paper documents
  • Make sure there is enough space
  • Store the documents safely
  • Keep the filing system simple (alphabetical, numerical, thematic, type)
  • Make sure you will also be able to understand the system in the future
  • Make an index file and code the documents
Digital documents
  • Take security measures to protect documents, including those that are privacy-sensative
  • Anonymise[2] or pseudonymise[3] privacy-sensitive documents
  • When you encrypt a file or folder, store the key separately from your documents
  • Make logical filing categories
  • Use folders for structuring and keeping file names short
  • Do not go deeper than 3 or 4 levels
  • Separate on-going and complete work
  • Use a systematic naming convention that uniquely identifies files
  • Use short and meaningful file names
  • Decide how many versions of a file to keep, which versions to keep, for how long and how to organise them
  • Identify milestone versions which cannot be altered or deleted
  • Include version numbers and/or dates in the file name
  • For notation of dates in file names use: year month day, such as 20140523 (for sorting reasons)
  • Record the changes that are made in a new version by using a version log
  • Version control can also be maintained by version-control facilities in the software you are using or in special versioning software
  • When working with others on data, maintain a master file

Backups

Paper documents
  • Make copies of paper documents and store these separately from the original documents
  • Digitise important paper documents
  • During research, archive both originals and copies
  • Store back-ups safely
Digital documents
  • Back up regularly, preferable at a fixed moment
  • Store back-ups separately from originals
  • Consider what to back up: files and folders - but also, perhaps, software applications

[1] The parts on filing and versioning and documentation are based on information in Managing and sharing data from the UK data archive.

[2] Anonymisation is a type of information sanitisation whose intent is to protect privacy. It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous (source: Wikipedia).

[3] Pseudonymisation is a procedure in which identifying fields in a data record are replaced by artificial identifiers (pseudonyms). There can be a single pseudonym for a collection of replaced fields or a pseudonym per replaced field. The purpose is to make it harder to identify individuals from the data record and thus to lower respondent or patient objections to its use. Data in this form are suitable for extensive analytics and processing (source: Wikipedia).