Yazılar

Apple Hit With Lawsuit Over Use of Books in AI Training

Apple was sued Friday in federal court in Northern California by authors who accuse the company of illegally using copyrighted books to train its “OpenELM” large language models. The proposed class action, filed by writers Grady Hendrix and Jennifer Roberson, claims Apple copied protected works without consent, credit, or compensation.

“Apple has not attempted to pay these authors for their contributions to this potentially lucrative venture,” the lawsuit alleges. Neither Apple nor the plaintiffs’ lawyers immediately commented.

The case adds Apple to the growing list of tech giants—Microsoft, Meta, and OpenAI among them—facing litigation over whether training AI on copyrighted material constitutes infringement or fair use. On the same day, Anthropic agreed to a $1.5 billion settlement with authors who accused it of training its Claude chatbot on pirated books, a deal hailed as the largest copyright recovery in history.

According to the lawsuit, Apple’s models were trained on a known dataset of pirated books, allegedly including works by Hendrix and Roberson. The case seeks damages and legal recognition that Apple must compensate authors when their intellectual property is used to build AI systems.

The dispute underscores the escalating clash between AI developers and creators, as courts weigh how copyright law applies to massive datasets powering generative AI. With multiple cases now moving forward in U.S. courts, the outcome could reshape both the AI industry and protections for authors in the digital era.

Cloudflare Introduces Pay-Per-Crawl Tool to Help Websites Monetize AI Bot Access

Cloudflare has unveiled a new tool designed to give website owners greater control over AI bot crawlers accessing their content, allowing them to block unauthorized scraping or set fees for access. The move aims to help publishers and content creators monetize the use of their material by artificial intelligence companies, which increasingly crawl websites to train AI models without sending traffic back or providing compensation.

The tool enables site owners to choose which AI crawlers can access their content and implement a “pay per crawl” pricing model, helping creators control how their work is used and ensure fair payment. This innovation comes amid declining referral traffic from search engines, which historically drove ad revenue to websites.

Major publishers like Condé Nast, the Associated Press, and social platforms including Reddit and Pinterest back the initiative. Cloudflare’s Chief Strategy Officer, Stephanie Cohen, explained that the tool is designed to establish a sustainable ecosystem for content creators and AI companies alike. She highlighted that rapid changes in traffic patterns demand new approaches, calling this tool “the beginning of a new model for the internet.”

Data from Cloudflare shows that Google’s ratio of crawls to visitor referrals has dropped from 6:1 to 18:1 in six months, suggesting users increasingly get answers directly from Google search results or AI features rather than visiting original sites. However, Google’s crawl-to-visit ratio remains far lower than AI firms like OpenAI, which have ratios around 1,500:1, reflecting heavy content scraping without referral traffic.

For decades, traditional search engines indexed web content and drove users to publishers, rewarding them for their work. But AI crawlers disrupt this model by harvesting data without sending visitors back, aggregating content in chatbots like ChatGPT, and reducing creators’ revenue and recognition.

Many AI companies bypass common publisher tools used to block scraping and argue their data collection is legal and fair use. This has led some publishers, including the New York Times, to sue AI firms for copyright infringement. Others have negotiated licensing agreements to protect their content and monetize usage.

Reddit, notably, has sued AI startup Anthropic for scraping user comments but also signed a licensing deal with Google, illustrating the complex responses from content owners seeking to protect their assets in the AI era.

Reddit Sues AI Firm Anthropic for Alleged Unauthorized Use of Data

Reddit has filed a lawsuit against artificial intelligence startup Anthropic, accusing it of illegally using Reddit’s content to train its AI models without permission or a licensing agreement. The suit was filed Wednesday in San Francisco Superior Court, marking the latest legal clash over AI companies’ use of third-party online content.

In the complaint, Reddit alleges that Anthropic has scraped and exploited data from the platform over 100,000 times, despite publicly claiming last year that it had blocked its bots from accessing Reddit. According to Reddit, Anthropic’s Claude chatbot even acknowledged it was trained on at least some Reddit data, but could not confirm whether deleted content had been included.

“Anthropic refuses to respect Reddit’s guardrails and enter into a license agreement,” the complaint says, contrasting the company’s stance with that of Google and OpenAI, both of which have entered licensing arrangements with Reddit.

Reddit claims Anthropic’s actions violate its user policies and have allowed the startup to enrich itself by “tens of billions of dollars.” The lawsuit seeks unspecified restitution, punitive damages, and an injunction to stop Anthropic from further using Reddit content for commercial purposes.

Anthropic Responds

An Anthropic spokesperson said the company disagrees with Reddit’s claims and intends to defend itself vigorously. The lawsuit adds further scrutiny to Anthropic, whose backers include tech giants Amazon and Alphabet (Google).

Anthropic recently launched its latest Claude models, Opus 4 and Sonnet 4, on May 22, and has reportedly reached $3 billion in annualized revenue, according to sources familiar with the matter.

Growing Legal Tensions Over AI Training Data

This legal dispute highlights a broader industry-wide debate over how AI companies source and utilize data to train large language models. Many websites and publishers argue that AI firms are profiting from content without compensating the creators, while AI companies contend that publicly available internet data falls under fair use.

In a statement, Reddit Chief Legal Officer Ben Lee emphasized the platform’s support for an open internet but said AI companies need “clear limitations” when it comes to scraping and monetizing content.

Both companies are headquartered in San Francisco, located just a few blocks apart.

The case has been filed under Reddit Inc v Anthropic PBC, California Superior Court, San Francisco County, No. CGC-25-524892.