Purpose limitation in the generative AI lifecycle

In brief: Our position on purpose limitation in the context of generative AI remains the same. See the original call for evidence for the full analysis.

Respondents

In February 2024, we published the second chapter of our consultation series. This set out our policy position on the interpretation of the purpose limitation principle in the generative AI lifecycle. This built on our existing 2020 position as set out in our core guidance on AI and data protection.⁴⁰

We received 46 responses from organisations, with just one respondent identifying as a member of the public. 16 responses came via our survey, with a further 30 received directly via email. The sectors most represented were:

creative industries (14);
trade or membership bodies (eight); and
the research sector (six).

A general consensus emerged among respondents about the consultation on purpose limitation. 12 (75%) respondents agreed with our overall analysis.

Original call for evidence

In our original call for evidence,⁴¹ we set out our positions on the interpretation of the purpose limitation principle in the generative AI lifecycle. To recap, the key positions were as follows:

Firstly, we determined that the different purposes of training and deploying a generative AI model must be explicit and specific. This is so that controllers, processors and data subjects have a clear understanding of why and how personal data is processed.

Secondly, we explained that developers who are reusing personal data for training generative AI must consider whether the purpose of training a model is compatible with their original purpose of collecting that data. This is called a compatibility assessment.

Thirdly, we explained that developing a generative AI model and developing an application based on such a model (fine-tuned or not) constitute different purposes. These purposes are in addition to the initial separate purpose that an organisation may pursue when collating repositories of web-scraped data.

Key points from the responses

The following key points arose in the consultation responses in terms of defining explicit and specified purposes:

Respondents, including generative AI developers and some law firms, argued that the ‘open-ended’ downstream uses of generative AI make it challenging for developers to be explicit about the purpose of the processing at the development stage.
On the other hand many respondents, including civil society and the creative industries, argued that “developing a model” is too broad a purpose without making the uses of the model clearer. Some civil society respondents argued that it would be impossible to define an explicit and specific purpose at the time of processing data to train a model.
Most respondents agreed that the purpose of using data to fine-tune a model was more likely to meet the requirements of the purpose limitation principle. This is because the purpose for processing is more explicit and specific and involves less data processing.
Some downstream AI deployers in the financial sector raised concerns that their power to choose use cases would be limited due to purposes being determined, and limited, at the development stage. They argued this could inhibit innovation and beneficial uses.

Building on this, people highlighted the importance of transparency in complying with the purpose limitation principle:

Respondents, particularly from the creative industries, argued that clear documentation about the purpose of processing personal data was necessary. This is in addition to the source and context of data collection, the lawful basis, completing a DPIA and demonstrating how they fulfil article 13 and 14 of the UK GDPR.
Deployers of generative AI also outlined the need for developers to provide guidance on the foreseeable downstream applications of their models, including their capabilities and limitations. This was linked to questions about how developers could monitor the use of ‘off-the-shelf’ or ‘open-access' products and services in practice.
Trade and membership bodies raised concerns about trade secrets and called for a balanced approach to transparency. For example, revealing too much detail about the purpose of model training could reveal product information to a competitor.
Respondents also suggested the wider use of contracts and ToU to ensure that the purpose of processing personal data, at each stage of the lifecycle, is being made clear to people involved.

The following key points arose on the reuse of personal data to train generative AI:

Developers desired a broad interpretation of data protection’s purpose limitation principle around the re-use of data to train generative AI, citing greater innovation as a justification.
But creative industry respondents consistently argued that generative AI developers do not understand the importance of articulating an explicit purpose for the reuse of personal data. They also argued developers did not appreciate that personal data is often embedded in copyright-protected works, which creatives rely on for their remuneration as professionals.
Numerous respondents, particularly from the creative industries, argued that companies are not meeting people’s reasonable expectations when they take and use content containing personal data without permission. This is because people do not expect companies to use this information for free, without payment or acknowledgment.
Technology trade and membership bodies raised that reusing personal data to train generative AI is crucial to innovation.

The following key points arose about initially developing a generative AI model and then developing an application based on such a model (fine-tuned or not) constituting different purposes:⁴²

Most respondents felt that having separate purposes for development and deployment is necessary to ensure data minimisation and to check that the appropriate lawful basis is in place at each stage.
A broad range of respondents highlighted that separating purposes supports a risk-based approach. This is because they can offer a proportionate level of review, assurance and oversight to potential harms at each stage of the lifecycle.
There was a view from technology trade and membership bodies that a rigid distinction between development and deployment could stifle innovation as, in practice, development and deployment are cyclical (for example, when a model is engaged in continuous learning).

Our response

We welcome the support we received for our initial position on how the purpose limitation principle should be interpreted in the generative AI lifecycle. We are reassured by the positive response to our view that developing a generative AI model and developing an application based on such a model constitute different purposes under data protection.

We recognise that respondents have different views on what qualifies as defining explicit and specified purposes under article 5(1)(b) for training generative AI. A common request from all respondents was for the ICO to develop guidance that includes examples of how developers could demonstrate a sufficiently detailed and specific purpose when training generative AI. We will consider this request for the next iteration of our core guidance on AI, while maintaining our consultation position.

We have consistently said that data protection law recognises the need to protect intellectual property and trade secrets. For example, our guidance on explainability says that providing people with a meaningful explanation about the processing doesn’t mean including source code or proprietary algorithms.⁴³

Respondents suggested that developers could use contracts or ToU to ensure that they are making the purposes clear to people whose data they’re using to train a model, as well as the people whose data is used during deployment, and the ICO. Contracts or ToU could detail the purpose or purposes of the processing and set out expectations on who communicates what to which group. Where parties are relying on contracts or ToU to ensure that purpose or purposes of processing are communicated to individuals and others, they will need to ensure that the requirements in these contracts or ToU are effective. We note that these parties will continue to process data using the legitimate interests lawful basis, even where they have a contract or ToU in place.

40 See: How do we ensure lawfulness in AI?

41 Generative AI second call for evidence: Purpose limitation in the generative AI lifecycle

42 Our 2020 guidance outlines that this is the case: How do we ensure lawfulness in AI?

43 The basics of explaining AI: Benefits and risks