AI

Copyright Group Shuts Down Dutch Language AI Dataset

Introduction: Action Against Unlawful Data Use

On Tuesday, Dutch copyright enforcement group BREIN announced the removal of a significant language dataset that was being offered for AI model training. The dataset contained unauthorized information gathered from various sources, raising concerns over copyright violations.

Unauthorized Content in the Dataset

BREIN revealed that the dataset included material collected without consent from tens of thousands of books, news websites, and Dutch language subtitles from numerous films and TV series. This extensive collection of copyrighted content led to the group’s swift action to prevent further misuse.

Challenges in Tracking Dataset Usage

Bastiaan van Ramshorst, Director of BREIN, expressed uncertainty about the extent to which AI companies may have already utilized the dataset. “It’s very difficult to know, but we are trying to be on time,” he told Reuters, emphasizing the importance of proactive measures to avoid future legal challenges.

Implications of the EU AI Act

Van Ramshorst also highlighted the forthcoming European Union AI Act, which will require AI companies to disclose the datasets used in training their models. This regulation aims to enhance transparency and accountability in the AI industry.

Global Context: Similar Cases in the U.S. and Denmark

The issue of copyright infringement in AI training is not limited to the Netherlands. In the United States, OpenAI, backed by Microsoft, is facing several lawsuits, including one from The New York Times, for allegedly using copyrighted material without permission. Similarly, in Denmark, the Danish Rights Alliance successfully forced the removal of a large dataset known as “Books3” last year.

Resolution: Cease and Desist Compliance

The individual responsible for offering the Dutch dataset agreed to comply with a cease and desist order and promptly removed the content from the website where it was available for download. BREIN did not disclose the individual’s identity, citing Dutch privacy laws.

Conclusion: A Warning to the AI Industry

BREIN’s actions serve as a reminder to the AI industry about the importance of respecting copyright laws. As regulations tighten and enforcement actions increase, AI companies must ensure that the datasets they use for training are legally obtained and properly documented.

Leave a comment