Pseudonymisation — Data Saves Lives

Pseudonymous Data

A simple way to protect the privacy of the people who take part in research is to anonymise the data. This means taking away all identifying elements from the data set, such as name, address and ID number. The law (GDPR) states that for data to be classed as anonymous, it must be stripped of all identifiers so that data set cannot be linked back to the data subject. If this can be done, then the researcher can use that data without concern about privacy law does not apply to anonymous data. It is however not always possible or appropriate to anonymise data. The law requires that when data cannot be anonymised, efforts should still be made to protect privacy. In many cases this may be done by pseudonymising the identity of the data subject. This is a halfway house between identifiable data and anonymous data, it is achieved by replacing obvious identifiers, such as name, date of birth, or address with a code.

Anonymous data is not always suitable for healthcare research. An example of such research would be a study of the long-term outcomes of a treatment, where data will be collected over an extended period of time and released to the research team at regular intervals. In this type of research the researcher team may not need identifiable data, but anonymous data will not be good enough as it will not allow them to link data points related to the same person collected at different points in time. Good data governance and respect for privacy demands that in this situation data are pseudonymised, as data should never be kept in an identifiable format if that is not absolutely necessary. To do this, each patient’s record is given a false identifier, a pseudonym. This is often a random or jumbled number or pattern of letters and numbers that is created for each patient. The pseudonym is used to link new data to previous data for the same patient.

Law demands that pseudonymisation is very hard to undo, in healthcare research this means that only the healthcare organisation which created the pseudonyms could look up who each patient this and re-identify the data Good data governance demands that they are only permitted to do this when generating new research data, or if there is any need to contact some patients. This may be because additional contact with the patient is needed for the research or because the research has revealed something that a particular patient should be informed about. This is often referred to as an incidental finding, which is a finding concerning an individual research participant that has a potential health importance for that research participant and is discovered in the course of conducting research or diagnostic procedures but is beyond the aims of the study. In such cases only the healthcare organisation contacts the relevant patients, the research team remain unaware of full identity of the patient.

An example of pseudonymisation is shown below. The table on the left shows data within a hospital that is used to create pseudonyms its patients (the column called Pseudo-ID). The table on the right shows an example of what is released to a research team. It is impossible using only the table on the right to know who any patient is, especially if there is no information about which healthcare organisation the data comes from, this means that the data are psuedonomised in legal terms, and the joint objectives of making data useable for research and also protecting privacy are achieved.