Introduction to anonymisation
At a glance
- In data protection law, anonymous information is data that does not relate to an identified or identifiable person (ie data that is not personal data). Data protection law does not apply to anonymous information.
- To understand anonymisation, you must first understand what personal data is.
- Anonymisation is the process of turning personal data into anonymous information so that a person is no longer identifiable.
In detail
- What is personal data?
- What is anonymous information?
- What is anonymisation?
- Is anonymisation always necessary?
- Is anonymisation always possible?
- What are the benefits of anonymisation?
- If we anonymise personal data, does this count as processing?
- What is the difference between anonymisation and pseudonymisation?
- What about ‘de-identified’ personal data?
What is personal data?
Data protection law regulates the processing of personal data.
The UK GDPR defines personal data as:
“any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.“
This definition applies for the purposes of Part 2 of the DPA 2018.
Section 3(2) of the DPA 2018 says that personal data is:
“any information relating to an identified or identifiable living individual”
Section 3(3) defines an “identifiable living individual” as:
“a living individual who can be identified, directly or indirectly, in particular by reference to—
- an identifier such as a name, an identification number, location data or an online identifier, or
- one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of the individual.”
Essentially, the same definition of personal data applies to the UK data protection framework as a whole.
As personal data has to be about living people, data protection law does not apply to information about the deceased. However, you should note that this data may still be protected by confidentiality or other laws or rules.
Further reading
Read our guidance on ‘What is personal information?’ in the Guide to the UK GDPR.
What is anonymous information?
Data protection law does not explicitly define ‘anonymous information’, but Recital 26 of the UK GDPR says this is:
“…information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”
This means that anonymous information does not relate to an identifiable person (either in isolation or when combined with information from other sources). Data protection law does not apply to anonymous information.
If you process personal data (as opposed to anonymised information), you must comply with the data protection principles and be able to demonstrate how you do so.
Other laws may still apply to anonymous information. For example, certain aspects of the Privacy and Electronic Communications Regulations 2003 (PECR) apply to ‘information’, not just personal data (such as the provisions about using storage and access technologies).
Further reading
See the Guide to PECR’s sections on traffic data, location data, and storage and access technologies.
What is anonymisation?
Anonymisation is the way in which you turn personal data into anonymous information, so that it then falls outside the scope of data protection law. You can consider data to be effectively anonymised if people are not (or are no longer) identifiable.
We use the broad term ‘anonymisation’ to cover the techniques and approaches you can use to prevent identifying people that the data relates to.
We use the term ‘effective anonymisation’ to mean the technical and organisational measures you need to ensure that the status of the data meets the legal threshold for anonymisation under UK GDPR. Any information that does not meet this threshold is not anonymous information, and you must treat it as personal data.
Anonymisation issues may be more complex if you have large datasets that contain a wide range of personal data. You may therefore need specialist expertise and input beyond this guidance.
Further reading – ICO guidance
Read the section ‘How do we ensure anonymisation is effective?’ for further guidance on the factors you should consider when you assess identifiability.
Is anonymisation always necessary?
No. Data protection law provides a framework to enable the fair, lawful and transparent use of personal data. However, if you don’t need to use personal data to achieve your objectives, then you should assess whether you can use anonymous information instead.
Is anonymisation always possible?
In some instances effective anonymisation may not be possible due to:
- the nature of the data;
- the purpose(s) you collect, use or retain it for; or
- the context of the processing.
Example
A health authority considers anonymising a dataset containing information derived from people’s medical records.
Even if they remove all identifying information, such as names and addresses, there may still be enough details about the people to potentially re-identify them. This may include details such as:
- the approximate dates and location of their treatments;
- the type of treatments;
- the approximate ages of the people; or
- other distinguishing characteristics.
This means it may still be possible to link the information with other information sources or make it possible to deduce the identity of people.
What are the benefits of anonymisation?
Anonymisation limits the risks to people and can allow you to make information available to other organisations or to the public.
It makes it easier to use and share the information, as fewer legal restrictions apply.
Anonymising personal data can help you to:
- improve your risk reduction and management processes;
- adopt a data protection by design approach;
- protect people’s identities;
- reduce reputational risks or reduce questions, complaints or disputes caused by inappropriate or insecure disclosure or publication of personal data;
- have alternatives to deletion. For example, once the retention period you’ve set for personal data has come to an end (if you intend to retain, rather than delete the data), you must inform people that you will anonymise their personal data following the end of the retention period);
- publish information; and
- effectively comply with other legal obligations such as public authorities responding to FOI or EIR requests, or requests for re-use, involving personal data.
Wider benefits of anonymisation include:
- developing greater public trust and confidence that organisations are using data for the public good, while protecting privacy;
- greater transparency as a result of organisations being able to make anonymous information available to the public;
- incentivising researchers by increasing the availability of information;
- economic and societal benefits deriving from the availability of otherwise non-disclosable information; and
- improved public authority accountability through increased availability of information about service outcomes and improvements.
- allows rich data resources to be made available whilst protecting people’s privacy.
Effective anonymisation of personal data is possible and desirable.
Further reading – ICO guidance
Visit our data sharing information hub for more information about the data sharing code.
If we anonymise personal data, does this count as processing?
Yes. For the purposes of data protection law, applying anonymisation techniques to turn personal data into anonymous information counts as processing personal data. The end result (the anonymous information) is not subject to data protection law, but the procedure (anonymisation) is.
For example, when you create aggregate statistical information from personal data, that data is ‘adapted’ or ‘altered’. The law defines activities like these as ‘processing operations’.
This means that you must comply with data protection requirements for your anonymisation process. This includes ensuring you have a lawful basis for it and you clearly define your purpose(s) and provide people with information about it.
In general it is likely that applying anonymisation techniques to the personal data you hold will be fair and lawful. However, save to the extent that it would allow identification as a result of reverse engineering of those techniques, you must clearly define your purpose and detail the technical and organisational measures you intend to implement to achieve it.
What is the difference between anonymisation and pseudonymisation?
Data protection law also uses the term ‘pseudonymisation’. This is not the same as anonymisation. It's important to understand the difference, what it means in data protection law, and how this use may differ from what the word means in other circumstances.
Pseudonymisation is a technique that replaces information that directly identifies people, or de-couples that information from the resulting dataset. For example, it may involve replacing names or other identifiers (which are easily attributed to people) with a reference number. This is similar to how the term ‘de-identified’ is used in other contexts. For example, removing or masking direct identifiers within a dataset.
This guidance uses the term ‘pseudonymous data’ to describe personal data that has undergone pseudonymisation in line with the legal definition. It is information about people who can't be identified from that information by itself, but they can be identified from additional information held separately.
Pseudonymous information is still personal data and the law applies to it. Pseudonymisation reduces the links between people and the personal data that relates to them, but does not remove them entirely. Anonymisation prevents there being a link between the information and the person concerned.
It is common to refer to datasets as ‘anonymised’ when in fact they still contain personal data, just in pseudonymised form. This poses a clear risk that you might fail to comply with data protection law. For example, you may mistakenly believe that the information is anonymous and the law doesn't apply.
Remember that if you can still identify people using additional information you hold separately, the data is not anonymised. It is pseudonymised and is still personal data. You must still comply with data protection law.
Ultimately, you should think of anonymisation as a way of reducing the amount of personal data you hold, and pseudonymisation as a way of reducing the risks associated with the personal data you hold.
Example: anonymisation and pseudonymisation compared
A retail company collects customer transaction data to analyse shopping patterns to help improve their marketing strategies. They want to share this data with external consultants while ensuring no customer can be directly or indirectly identified from the data.
The company removes all direct identifiers (eg customer names, email addresses, and phone numbers) from the dataset. The remaining data is aggregated, including information such as purchase frequency, product categories, and spending amounts, and noised using differential privacy.
The resulting dataset contains statistical information on customers’ shopping patterns without any link to specific people. The company can safely share the anonymised dataset with external consultants for analysis.
The same retail company needs to store customer data securely within their own system, while maintaining the identity of customers for the purposes of tracking customers’ purchase history for a loyalty card scheme.
To reduce the risk to those customers, they decide to implement pseudonymisation using tokenisation. Instead of using direct identifiers, they assign each customer a unique pseudonym. They store direct and indirect identifiers separately in a secure database.
The pseudonymised dataset includes information such as total spend, product preferences, and loyalty points. But this data cannot be re-identified without using the pseudonym to link to the separately held identity data. The company uses the pseudonymised data to understand shopping patterns (eg which products are popular and peak shopping times) without needing to directly identify people.
Further reading – ICO guidance
What is personal information? – “Identifiers and related factors”
See the section of this guidance on “How do we ensure anonymisation is effective?” for more information on identifiability.
The guidance on identifiers and related factors also discusses the considerations you should take into account when disclosing data to other organisations, including the status it may have once in their hands.
See the section of this guidance on pseudonymisation for more information, including guidance on how you should approach pseudonymisation.
What about ‘de-identified’ personal data?
While the term ‘de-identified’ is widely used, we do not encourage it as a synonym for anonymous information or pseudonymous data. This is because UK data protection law doesn’t define the term, so using it can lead to confusion.
Also, its meaning may differ depending on the circumstances. For the purposes of data protection law, we use this term only in connection with Section 171 of the DPA 2018, which states that the re-identification of “de-identified personal data” is a criminal offence.
In this context, ‘de-identified’ personal data is pseudonymised data or data that was considered anonymised but can be re-identified considering all means that are reasonably likely to be used.
While explanatory notes are not part of the law, they are intended to help understand the DPA 2018.
Further reading – ICO guidance