Anonymisation, pseudonymisation and personal data

Anonymise your data[1]

To protect your data, anonymise personal data and make sure re-identification by combining anonymised data with other population data is impossible. Statistical packages may have tooling for anonymisation, or you may use ARX.

Pseudonymise your data[2]

In some cases, it is necessary to not fully anonymise your data, for example when data subjects have the right to withdraw their data from the study. In this scenario, the researcher has to be able to identify the data of a specific subject in order to delete this data from the dataset. In these scenarios, pseudonymisation is an option.

Pseudonymisation means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information. This additional information is usually a key file, in which the pseudonymised data is linked to the personal data. Keep the key to pseudonymised data on the RU file folders, without access from outside. Use a password manager (such as Keepass) for your keys.

Personal data

Make sure you always protect the privacy of respondents. Sharing direct identifiable data is only allowed if the participant gave explicit consent for (public) sharing (in an informed consent form). Data files which contain personal data need to be anonymised before they can be shared with others or archived in a public archive.

Direct identifiable data such as audio or video files are hard to anonymise (without losing their scientific value) and should in general not be published in open access.

Anonymisation must be done according to the following guidelines:

  • remove all direct identifiers
  • remove indirect identifiers that are not essential for reusing the data
  • remove indirect identifiers with a high disclosure risk, such as unusual characteristics or unusual findings
  • reduce the level of detail of the indirect identifier (by for instance aggregation)

A combination of indirect identifiers may also lead to identification of a respondent; for instance, research about deaf people in a specific village. Consequently, in certain cases it is advised to choose a higher aggregation level, such as province instead of exact the village or town.

Another example is the combination of age in days and test date, which may lead to the exact age of the respondent. In research concerning school classes, participating children may thus be identified. In this case, either the test date can be reduced to the year, or the age should be adjusted to month or year.

Furthermore, datasets that include exact occupations may result in identification of the respondents. 'Nurse' or 'teacher' may not be very revealing, but 'director of [company X]' or 'leader of [religious community Y]' are. Exact occupations may be adjusted to occupational groups using the ISCO method.

Make sure you do not share the following direct identifiers with others or archive them in a public archive:

  • Name and/or initials
  • Date of birth
  • Addresses (instead: reduce zip code to numbers)
  • Telephone number, e-mail address and other contact info
  • Unique ID numbers, e.g. BSN, bank account number
  • Video, photo or audio data (voice)
  • Data containing participant's facial features
  • Dates that could be identifying, e.g. hospital visit, school test date
  • Pseudonyms (only allowed if key-file has been disposed)

Make sure you limit or reduce the level of detail when sharing the following indirect identifiers with others or archive them in a public archive:

  • Gender
  • Year of birth
  • Place of birth
  • Body measures (weight, height)
  • Socio-economic data (income, education)
  • Information about individuals' mental/physical well-being
  • Profession/occupation (if potentially identifiable, adjust to standard classification Dutch SBC or international ISCO)
  • Geographic information
  • Sensitive information (ethnicity, race, sexual orientation and risky behaviour)
  • Information that may stigmatise a community (e.g. membership of political or religious organisation)

Radboud University provides a website with information on how to deal with personal data according to the Data Protection Act.

[1] Anonymisation is a type of information sanitisation whose intent is to protect privacy. It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous (source: Wikipedia).

[2] Pseudonymisation is a procedure in which identifying fields in a data record are replaced by artificial identifiers (pseudonyms). There can be a single pseudonym for a collection of replaced fields or a pseudonym per replaced field. The purpose is to make it harder to identify individuals from the data record and thus to lower respondent or patient objections to its use. Data in this form are suitable for extensive analytics and processing (source: Wikipedia).