Accuracy of training data and model outputs

In brief: Our position on the application of the data protection principle of accuracy in generative AI remains largely the same. See the original call for evidence for the full analysis.

Respondents

In April 2024, we published the third chapter of our consultation series. This set out our policy position on the accuracy of training data and model outputs.

We received 25 responses from organisations, with just one respondent identifying as a member of the public. Nine responses came via our survey, with a further 17 received directly via email. The most represented sectors were:

creative industries (eight);
trade or membership bodies (three); and
finance (three).

Of the survey respondents, four (44%) agreed with our initial analysis and another four (44%) were unsure. There was a clear consensus about the importance of accuracy. However, respondents disagreed about how far developers or deployers should be primarily responsible for ensuring and communicating accuracy.

Original call for evidence

In our original call for evidence,⁴⁴ we set out our positions on the application of the accuracy principle to generative AI. We drew substantially on our existing positions. To recap, the key positions were as follows:

Firstly, we stated that developers will need to know whether their training data contains:

accurate, factual and up to date information;
historical information;
inferences;
opinions; or
AI-generated information about people.

In other words, the developer should curate the training data accordingly to ensure sufficient accuracy for the purpose for which it is processed.

Secondly, we determined that the appropriate level of statistical accuracy of the generative AI model is linked to the specific purpose organisations will use the model for. For example, creating or using generative AI models to create non-factual outputs as a source of inspiration will have different accuracy requirements than models whose outputs users rely on as a source of factual information.

Finally, we set out that developers should assess and communicate the risk and impact of incorrect and unexpected outputs. They should also provide clear information about the application’s statistical accuracy and its intended use. We provided a list of possible measures, including labelling the outputs as generated by AI (or not factually accurate) and providing information about the output’s reliability, for example by using confidence scores.

Key points from the responses

Firstly, in response to our view that developers should ensure training data is made up of accurate, factual and up to date information, the following key points arose:

Respondents – including a large generative AI developer – stated that it would be impossible to verify whether personal data in training datasets is factually accurate, because there is a lack of ‘ground truth’⁴⁵ to measure against. They also added that limiting training data to accurate personal data would negatively affect model performance because of a lack of diverse data.
Many respondents from the creative industries, finance sector and a media organisation all raised concerns that a fundamental lack of transparency about training data by developers means that it is challenging to determine, and question, its accuracy. The creative industries also raised concerns that web-scraped data is highly likely to contain inaccurate personal data.
The creative industries and a media organisation emphasised the importance of high quality, accurate training data for downstream purposes where organisations rely on the model as a source of factual information.
Researchers and academics emphasised the role of independent external audits in ensuring accurate training data and model development.

Secondly, in response to our view that the specific purpose for which a generative AI model will be used is what determines whether the outputs need to be accurate, the following points arose:

There was general agreement that the degree of accuracy is linked to the specific purpose of a generative AI model. We saw cross-cutting agreement on this from the technology sector, the creative industries and the financial services sector.
However, creative industry respondents raised specific concerns about the likeness of creatives being generated. They questioned the extent to which creative purposes require less accuracy because it may cause misrepresentation. In contrast, a law firm argued that accuracy was not as important for creative purposes, because the content being generated is not presented as factual.
Generative AI developers claimed that deployers and end users (ie consumers) were chiefly responsible for the accuracy of outputs, as they solely define the purpose. One industry group highlighted that deployers may also be better placed to understand a model’s purpose.
In contrast, a respondent from the financial services sector argued that, while they agree that developers cannot fully anticipate all of the potential uses, they are still primarily accountable for the quality of the output.

Finally, in terms of assessing and communicating the risk and impact of incorrect and unexpected outputs, as well as providing clear information about statistical accuracy, the following key points arose:

There was general support for developers to provide clear information about a model’s statistical accuracy. One industry and trade body described informing deployers and users of generative AI systems about the statistical accuracy of the model as essential. The technology sector identified examples such as the use of technical reports, model cards and clarifying to users that results may be inaccurate. They also cited retrieval augmented generation (RAG) as a method of ensuring accuracy.
The creative industries firmly supported measures like labelling and watermarking, including efforts such as embedding metadata into outputs. One law firm stated that these types of measures would be necessary to meet accuracy obligations.
Another law firm raised concerns that if communication and disclaimers alone are sufficient for compliance, this may incentivise developers to put less resource into ensuring a model’s robust statistical accuracy.
However some respondents, including from the technology sector, argued that providing too much information places unrealistic expectations on developers. They may not anticipate all of the downstream uses of a model or even have a relationship with the end user. Some also raised the limitations of watermarking, such as limited use in text outputs and a lack of durability.
A number of respondents emphasised the role of technical and organisational measures that developers could deploy, such as monitoring outputs and engaging in content authentication (eg C2PA).⁴⁶ One generative AI developer argued that it would not always be feasible for developers to monitor user-generated content, such as analysing inputs or tracking publicly-shared outputs.

Our response

We welcome the overall support we received for our position on accuracy. We are reassured by the support for our position that accuracy is closely linked to a model’s specific purpose. This aligns with our established position on the accuracy principle.⁴⁷ We appreciate that there is disagreement among respondents about the degree of the statistical accuracy required, meaning how often a model’s output is accurate. As we said in our consultation, clear communication between developers, deployers and end-users of models is key to ensuring that the degree of statistical accuracy is proportionate to the model’s final application.⁴⁸

In terms of ensuring developers use accurate data to train a model, we accept that there can be limitations to validating ground truth.⁴⁹ However, this does not negate the relationship between inaccurate training data and inaccurate model outputs. As we said in our consultation, generative AI developers will need to understand and be transparent about the accuracy of the training data used to train generative AI. Even though accurate training data will not stop generative AI models from hallucinating,⁵⁰ it can constrain the margin of error.

Many respondents, including both the creative industries and technology sector, asked us to provide use cases and examples of what would constitute appropriate and sufficient means of assessing and communicating the risk and impact of incorrect and unexpected outputs. They also wanted clear information about statistical accuracy. We will consider this for future guidance updates.

We acknowledge the concerns raised about deepfakes, misinformation and the use of people’s likeness. While data protection law can apply to the creation and dissemination of deepfakes in some cases, other legislation such as the Online Safety Act, copyright law and criminal law are also relevant to addressing the harms caused by these technologies.

We also understand that many of the safeguards and measures we initially proposed may have technical limitations or constitute emerging rather than robustly tested practices, such as text watermarking. Unfortunately, we received little evidence to demonstrate a technical measure’s viability or lack thereof in practice. We will continue to monitor safeguards and measures and expect generative AI stakeholders and innovators to provide novel solutions to ensure that people can use models in ways that are appropriate to the level of statistical accuracy that the developer knows them to have.

44 Generative AI third call for evidence: accuracy of training data and model outputs

45 In this context, it would mean checking training data against its source and validating that the source of the data was itself accurate (for example, a news article).

46 Coalition for Content Provenance and Authenticity (C2PA)

47 Principle (d): Accuracy

48 Statistical accuracy relates to fairness under data protection. See What do we need to know about accuracy and statistical accuracy?

49 ‘Ground truth’ refers to content that is verifiable by trusted sources.

50 See glossary.