Risks from user input to output

Generative AI

Diagram: Risks from user input to output

User input

Using a generative Al solution will require input of data in the form of prompts and information you need to provide to receive the desired output.

Issue: This raises questions around permission rights and reuse of user data by the solution provider. You need to ensure you have the right permissions to feed input data into generative Al solutions and that such use would not potentially breach confidentiality/third party rights. Typically, the solution provider will want to reuse the input data to improve the model.

Model processing

Generative Al solutions are not perfected models, The nature of generative Al solutions makes it difficult to clearly explain how the solution works.

Issue: Explainability is a key principle running through existing legislation (for example, the GDPR) and the pending EU Al Act. Even where solution providers provide the basics as to how generative Ai models work, transparency will continue to be an issue with the level of information regulators want to know in order to assess how the solution actually works.

Training data

Generative Al models need to be powered by large data sets, which are typically acquired through collating and processing large amounts of publicly available data. For example, ChatGPT utilises training datasets which are based on scraping billions of words from WebText datasets. The fact that data is made publicly available does not mean it can be used without restriction in a lawful manner. If you were able to use proprietary data sets, or had comfort that the data sets were appropriately licensed, a lot of the headaches around the training data phase would fall away.

Issue: Whilst the collection of the training data is a developer activity, concerns around whether or not the training data has been lawfully collected is an issue for both developers and users. Likewise relying on publicly available data raises questions as to the accuracy and completeness of training data.

Output

In theory, the output of the generative Al solution triggered by the prompts and user inputs can create new content capable of being owned by the user. However, there is a complex interplay between IP law and the contractual terms which apply to the use of the solution.

Issue: It should not be assumed that the output generated is necessarily owned by the user or can be used without restrictions. New content may be similar enough to existing content to raise questions of IP infringement, creating potential liability exposure for the user. The terms and conditions for use of the solution may also contradict the expected position of user ownership. Risk of use is also likely to contractually fall on the user.