Organising data

During research, it is relevant to consider the structure of your data. We subsequently discuss filing and versioning and making backups.

Filing and versioning^[1]

Paper documents

Make sure there is enough space
Store the documents safely
Keep the filing system simple (alphabetical, numerical, thematic, type)
Make sure you will also be able to understand the system in the future
Make an index file and code the documents

Digital documents

Take security measures to protect documents, including those that are privacy-sensative
Anonymise^[2] or pseudonymise^[3] privacy-sensitive documents
When you encrypt a file or folder, store the key separately from your documents
Make logical filing categories
Use folders for structuring and keeping file names short
Do not go deeper than 3 or 4 levels
Separate on-going and complete work
Use a systematic naming convention that uniquely identifies files
Use short and meaningful file names
Decide how many versions of a file to keep, which versions to keep, for how long and how to organise them
Identify milestone versions which cannot be altered or deleted
Include version numbers and/or dates in the file name
For notation of dates in file names use: year month day, such as 20140523 (for sorting reasons)
Record the changes that are made in a new version by using a version log
Version control can also be maintained by version-control facilities in the software you are using or in special versioning software
When working with others on data, maintain a master file

Backups

Paper documents

Make copies of paper documents and store these separately from the original documents
Digitise important paper documents
During research, archive both originals and copies
Store back-ups safely

Digital documents

Back up regularly, preferable at a fixed moment
Store back-ups separately from originals
Consider what to back up: files and folders - but also, perhaps, software applications

^[1] The parts on filing and versioning and documentation are based on information in Managing and sharing data from the UK data archive.

^[2] Anonymisation is a type of information sanitisation whose intent is to protect privacy. It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous (source: Wikipedia).

^[3] Pseudonymisation is a procedure in which identifying fields in a data record are replaced by artificial identifiers (pseudonyms). There can be a single pseudonym for a collection of replaced fields or a pseudonym per replaced field. The purpose is to make it harder to identify individuals from the data record and thus to lower respondent or patient objections to its use. Data in this form are suitable for extensive analytics and processing (source: Wikipedia).

Organising data

Filing and versioning^[1]

Backups

RDM infrastructure

Also relevant

Contact

Organising data

Filing and versioning[1]

Backups

Filing and versioning^[1]