As artificial intelligence (AI) continues to advance, so do the legal complexities surrounding its use, especially regarding the training of AI models with copyrighted materials. A recent ruling by the Hamburg Regional Court (LG Hamburg, judgment of 27.09.2024 – 310 O 227/23) addresses the issue of whether incorporating a copyrighted image into an AI training dataset constitutes a copyright violation under German law.
This case provides a pivotal moment for AI developers and content creators alike, as it clarifies how existing copyright law applies in the context of AI training datasets. At the same time, broader questions regarding the compatibility of current copyright frameworks with rapidly evolving AI technologies remain unresolved.
The Case: Copyright Infringement and AI Training Data
The dispute in the Hamburg Regional Court centered around a dataset of 5.85 billion image-text pairs, intended for use in training generative AI systems. The dataset, which was made freely available to the public, included images scraped from various websites. One of the images, sourced from a photo agency and marked with a watermark, was allegedly used without the photographer’s permission. The plaintiff, claiming to be the creator of the image, argued that downloading and storing the photo for inclusion in the AI training set amounted to an unauthorized reproduction, infringing upon their copyright.
Legal Issues
The court had to address two main legal questions:
- Violation of Reproduction Rights (Section 16 of the German Copyright Act – UrhG)
The plaintiff asserted that by downloading and incorporating the photo into the AI dataset, the defendant violated their copyright, as the act of reproducing the image without permission constituted an infringement under Section 16 UrhG, which governs reproduction rights. - Applicability of Copyright Exceptions
The defendant countered by invoking several exceptions under the German Copyright Act (UrhG), specifically Sections 44a, 44b, and 60d. These provisions relate to temporary reproductions, text and data mining, and reproductions for scientific research, respectively.
The Court’s Ruling
The Hamburg Regional Court ultimately ruled in favor of the defendant, determining that no copyright infringement had occurred. The court’s decision was based on a liberal interpretation of the copyright exceptions, particularly in light of the rapid technological developments in AI.
- Temporary Reproductions (§ 44a UrhG)
The court ruled that downloading the image as part of an automated process integral to the operation of the AI training system constituted a temporary reproduction under § 44a UrhG. This section allows for temporary reproductions as part of a technical process, provided they serve no independent commercial purpose and are necessary for the operation of the system. The court found that this was the case for the defendant, as the reproduction was essential to the function of the AI model training process. - Text and Data Mining (§ 44b UrhG)
The court sided with the defendant’s argument that collecting image-text pairs for AI training constituted a form of data mining under § 44b UrhG. This provision allows large-scale data analysis (data mining) and was interpreted broadly to include the training of AI models. The court acknowledged that the scope of § 44b UrhG was designed to promote technological advancements, including AI innovation, even when copyrighted works are involved. - Scientific Research (§ 60d UrhG)
Although the AI training had commercial purposes, the court accepted that the process still contained a significant scientific component. Section 60d UrhG permits reproductions for scientific research, even if the research serves commercial goals, as long as the research remains the primary objective. The court concluded that the dataset’s use for AI training fell under this provision, thereby dismissing the plaintiff’s claim.
Broader Legal Context: The Challenge of Training AI Models with Copyrighted Works
While the Hamburg court’s ruling offers a degree of clarity, the case also highlights broader unresolved legal questions about the interaction between AI and copyright law in Germany. As noted in a legal analysis by Dornis and Stober, the training of generative AI models such as ChatGPT, DALL-E, and Stable Diffusion often involves substantial use of copyrighted data. These models, capable of generating creative content based on user prompts, rely heavily on large datasets for their training, many of which contain copyrighted works.
The legal issues surrounding AI training datasets can be broken down into several key phases:
- Collection and Preparation of Training Data
The act of gathering copyrighted works for inclusion in AI training datasets can itself be seen as a form of reproduction. During the process of collecting data (often through scraping), copyrighted materials are copied and stored, which may violate reproduction rights under § 16 UrhG if not covered by a legal exception. - Training of AI Models
When AI models undergo training, they effectively “memorize” parts of the data they have been exposed to. This internalization of data during processes like pre-training and fine-tuning could also be considered a form of reproduction under German copyright law, as the data is used to build the model’s functionality. - Use and Public Availability of AI Models
Once the AI models are made available to the public, either as standalone software or through user interfaces, there may be further copyright implications. If the model produces outputs that resemble or derive from copyrighted works used in its training, questions arise about whether this constitutes a new act of reproduction or distribution of the original works.
The Limits of Existing Copyright Exceptions in Germany
Current German copyright law, while containing some exceptions, may not be fully equipped to address the complexities of AI training:
- Temporary Reproductions (§ 44a UrhG) only apply to fleeting copies made during technical processes. This is often insufficient for AI models, which store data more permanently.
- Text and Data Mining (§ 44b UrhG) allows the use of copyrighted works for large-scale data analysis, but may not cover the full scope of AI training, where data is memorized and used creatively by the model. While the Hamburg court extended the data mining exception to include AI training, this interpretation may not cover all cases.
- Scientific Research (§ 60d UrhG) applies to non-commercial and commercial research but may not always be relevant for AI models primarily used for commercial purposes.
The DSM Directive, which informs EU-wide copyright law, further complicates matters. It was not designed with the capabilities of generative AI models in mind and provides limited guidance on how AI should be regulated when it comes to copyright.
Conclusion: A Call for Legal Clarity
The Hamburg Regional Court’s decision marks an important step in clarifying how existing copyright law in Germany applies to AI training datasets. However, as AI technology continues to evolve, the limitations of current legal frameworks become more apparent. While exceptions such as those found in §§ 44a, 44b, and 60d UrhG offer some flexibility, they do not fully address the needs of either AI developers or copyright holders.
This ruling underscores the need for clearer legislative guidance to balance the protection of intellectual property rights with the promotion of AI innovation. As generative AI becomes more integral to industries around the world, lawmakers in Germany and beyond must confront the growing tensions between copyright law and technological progress.
- BiotechCrime: Biotechnology and biohacking as a criminal offense - 10. February 2025
- European arrest warrant: Support in Germany - 2. February 2025
- Red Notice - 2. February 2025