As generative artificial-intelligence tools radically reshape how content is created and consumed, questions of copyright, intellectual property and data ownership are moving to the fore.
In a ForkLog column, Tatiana Kontareva, a lawyer at Aurum, analyses the key regulatory aspects of using and distributing AI-created content and outlines strategies to help Web3 projects ensure legal safety and build transparent relationships with users.
Who owns AI-generated content?
Despite rapid progress in AI, the question remains open and hotly debated in legal and technology circles. The creation process typically involves several actors: model developers, trainers, users and the AI models themselves. Yet when it comes to ownership of the final output, there is still no single, definitive answer as to who—if anyone—has a lawful claim.
If you build or use generative-AI tools, this bears directly on your business model, monetisation and exposure to legal risks. When technology outpaces rule-making, legal strategy is not merely important; it can prove decisive for sustainable growth and the project’s legal safety.
In essence, projects integrating LLMs and other AI tools should keep two points in mind:
- no universal approach. There is no single, widely accepted legal position on who owns rights to AI-generated content. Countries regulate the issue differently. In most jurisdictions, copyright protection turns on two conditions. First, the author must be a human; second, the content must be sufficiently original. If content is generated entirely by AI, these criteria may not be met. The result may then fall outside copyright protection and be treated as part of the public domain—free to use. Relying on that, however, is a risky strategy. The legal landscape is shifting quickly; what is permissible today may be infringement tomorrow. This is especially true where a human contributes creatively—writes prompts, edits the output or adapts content for specific aims. Results of such a hybrid approach may attract copyright protection;
- a proactive stance. When the law is silent or unclear, the best course is to build your own framework. Explicitly fix ownership, liability and use of AI-generated content in legal documents such as terms of use and policies, licence agreements and other instruments governing user interaction. These provisions must be carefully drafted and compliant with applicable law to be enforceable. This ensures transparency with users, sets realistic expectations about rights and duties, and creates a legal foundation for responsible AI deployment.
From our observations and experience, most companies integrating LLMs and other AI tools—including industry leaders such as OpenAI and Google (Gemini)—generally do not claim ownership of outputs generated by their models. However, while title to final materials may not be a priority, projects typically seek to retain the right to use user prompts and AI-generated materials to further train models. At this stage, intellectual property becomes especially significant—not only legally, but also for user trust and the ethical handling of data in the development and deployment of AI technologies.
Consider two main categories of data most often used to train models:
- user content and prompts. In most cases, projects rely on users’ assurances that they either own the content supplied when interacting with AI tools or have the right to use it. This approach carries risks, discussed below. To use user content for training on a legitimate basis, users must grant you the relevant rights—legally, by providing a licence or transferring ownership. For such provisions to be valid and enforceable, licensing and assignment terms must be clear and compliant with applicable law;
- AI-generated content. First, define clearly who owns the materials. This is the key question that determines the lawfulness of any subsequent use of such data. If the user owns the rights, the project must obtain separate permission to use the content for training. If the rights remain with the project, further use generally requires no additional consent. Each case, however, warrants individual assessment—especially where the content materially shaped the final result or became an integral part of AI-generated materials. If rights to such data were not, or cannot be, transferred, using the relevant fragments for training or other purposes may require separate human authorisation.
Training data: what can and cannot be used
Most copyright disputes arise from using protected materials to train AI models without proper permission. To work effectively and realise their full potential—especially LLMs—models need large and diverse datasets. This creates a tension between the need for breadth and the constraints of copyright law.
First and foremost, it is crucial to know where your model obtains its training data. As a rule, AI-based platforms draw on at least two main sources:
- publicly available content. A common misconception is that any content in the open is free to use for any purpose, including AI training. In practice, not so: even if publicly accessible, data may be protected by copyright. A telling example is the lawsuit, filed by The New York Times against Microsoft and OpenAI, in which the paper accuses the companies of unlawfully using its content to train AI models. Moreover, even content distributed under open or other permissive licences (for example, Creative Commons) may contain restrictions that preclude its use for AI training. It is therefore essential to review the terms of such licences carefully before including the data in training corpora;
- user-generated content. Another common source for training AI models is user prompts and materials. Their use, however, entails serious legal risks that are often underestimated. Many projects try to protect themselves by requiring users, in the terms of service, to confirm they hold rights to the content and consent to its use for AI training. In practice, this is not sufficient: verifying whether a user truly holds those rights is almost impossible. As a result, even with formal user consent, the risk of infringing third-party rights remains. Relying solely on user agreements, without additional legal mechanisms, is to court risk.
Does “fair use” shield against copyright infringement?
In some instances, the law does allow the use of copyright-protected content without a licence—but only if certain conditions are met. One of the best known is the doctrine of “fair use” (fair use). In The New York Times v OpenAI, the latter argued that training models on publicly available content falls under this doctrine. It is not absolute, however, and its application requires careful legal assessment case by case.
Courts typically weigh four factors when deciding whether use is fair:
- purpose and character of the use. Whether the use is commercial in nature or undertaken for non-commercial, educational purposes;
- nature of the materials. Whether the source material is factual or creatively expressive. Fair use is more often applied to factual (informational) works than to artistic ones;
- amount and substantiality. How much of the original work is used and how important that portion is. Even a small fragment can weigh against fair use if it is the work’s “heart”. The more transformative the use and the less original material is taken, the higher the chances of a fair-use finding;
- market effect. If the rightsholder can show that using its content to train AI models undermines its commercial value or reduces market demand, that will be a serious argument against fair use.
In sum, the risk of infringement rises markedly if potentially infringing content is repeatedly used in training or reproduced in generated outputs—especially where there are no effective mechanisms to monitor, identify and remove such content, even after a possible violation is flagged. Projects must therefore ensure that their training datasets and practices comply with applicable copyright law.
Practical legal strategies for deploying AI tools
Launching AI products, or products in which AI is a critical component, is not only a matter of technical innovation; it requires thorough legal and operational planning. Below are key points to reduce risk and make the product legally sound:
- legal strategy. In the absence of universal regulation that clearly addresses core AI issues—ownership, privacy, use of generated content for training and so on—involving qualified lawyers at the planning and launch stages is critical. Questions of ownership, data use and the risk of copyright infringement have no universal or simple solutions. What is needed is a comprehensive strategic approach that accounts for applicable law as well as the business model, architecture and technical specifics of the project;
- user documentation. Many projects underestimate the risks of poor user-facing documentation. Terms of use, policies on processing and distributing content, and limitation and exclusion of liability provisions are not mere formalities. They are critical legal tools that fix the parties’ rights and obligations and allocate risk between the project and its users;
- ownership of user and AI-generated content. From the outset, it is advisable to state clearly in the legal documentation who owns user content and AI-generated content, aligned with the project’s aims, intended use cases and applicable law. This ensures transparency, meets user expectations and provides a robust legal basis for responsible AI use;
- technical safeguards and related mechanisms. Where possible, implement technical measures to verify the provenance of user content; provide convenient, effective mechanisms to notify alleged copyright infringement; and promptly remove disputed materials from training datasets upon request. Such measures strengthen legal protection, demonstrate good faith and help minimise potential legal risks.
Conclusions and key takeaways
As AI technologies advance at unprecedented speed, the legal, ethical and regulatory questions surrounding the training, commercialisation and deployment of AI systems grow more complex. Key takeaways for projects focused on integrating and using models in their products:
- there is no universal rule determining ownership of AI-generated content. Clarity must be provided by well-drafted terms of use and internal user documentation;
- using publicly available or user data for AI training carries real copyright risks. Relying solely on user agreements is not enough;
- protection under the fair use doctrine is limited and fact-specific. It cannot serve as a reliable legal basis for large-scale model training;
- a proactive legal strategy is key to resilience. This includes clear licensing, comprehensive legal documentation and content-governance systems across the product’s lifecycle.
For founders, developers and business leaders working with AI and Web3 technologies, complying with applicable law is not just a box to tick; it is integral to a successful strategy. A competent legal approach can protect the business, build user trust, enhance model reliability and support the long-term viability of innovation.
