AI training data – Sayfa 2

OpenAI Appeals Court Order on Data Preservation in NYT Copyright Lawsuit

Haziran 15, 2025/in Tech/tarafından ayaksız

OpenAI has appealed a recent court order requiring it to indefinitely preserve ChatGPT output data in an ongoing copyright lawsuit filed by The New York Times (NYT). The company argues that the order conflicts with its obligations to protect user privacy.

Last month, the court mandated that OpenAI must preserve and segregate all output log data, after the NYT requested this as part of the discovery process. In response, OpenAI filed a motion on June 3 to vacate the data preservation order, according to a court filing.

OpenAI CEO Sam Altman publicly criticized the order on X, stating, “We will fight any demand that compromises our users’ privacy; this is a core principle.” He added that the NYT’s request was “inappropriate” and “sets a bad precedent.”

The lawsuit, originally filed in 2023, accuses OpenAI and its partner Microsoft of using millions of NYT articles without permission to train their language models, including the one powering ChatGPT. The Times alleges that this constitutes copyright infringement.

U.S. District Judge Sidney Stein previously ruled that the Times had made a plausible case that OpenAI and Microsoft may have induced users to infringe on its copyrights. In an earlier opinion, the judge allowed the case to proceed, citing numerous and widely publicized instances where ChatGPT reproduced substantial portions of Times content.

While the NYT declined to comment on OpenAI’s appeal, the case remains one of the highest-profile legal challenges facing generative AI companies over training data use and copyright infringement claims.

Reddit Sues AI Firm Anthropic for Alleged Unauthorized Use of Data

Haziran 15, 2025/in Tech/tarafından ayaksız

Reddit has filed a lawsuit against artificial intelligence startup Anthropic, accusing it of illegally using Reddit’s content to train its AI models without permission or a licensing agreement. The suit was filed Wednesday in San Francisco Superior Court, marking the latest legal clash over AI companies’ use of third-party online content.

In the complaint, Reddit alleges that Anthropic has scraped and exploited data from the platform over 100,000 times, despite publicly claiming last year that it had blocked its bots from accessing Reddit. According to Reddit, Anthropic’s Claude chatbot even acknowledged it was trained on at least some Reddit data, but could not confirm whether deleted content had been included.

“Anthropic refuses to respect Reddit’s guardrails and enter into a license agreement,” the complaint says, contrasting the company’s stance with that of Google and OpenAI, both of which have entered licensing arrangements with Reddit.

Reddit claims Anthropic’s actions violate its user policies and have allowed the startup to enrich itself by “tens of billions of dollars.” The lawsuit seeks unspecified restitution, punitive damages, and an injunction to stop Anthropic from further using Reddit content for commercial purposes.

Anthropic Responds

An Anthropic spokesperson said the company disagrees with Reddit’s claims and intends to defend itself vigorously. The lawsuit adds further scrutiny to Anthropic, whose backers include tech giants Amazon and Alphabet (Google).

Anthropic recently launched its latest Claude models, Opus 4 and Sonnet 4, on May 22, and has reportedly reached $3 billion in annualized revenue, according to sources familiar with the matter.

Growing Legal Tensions Over AI Training Data

This legal dispute highlights a broader industry-wide debate over how AI companies source and utilize data to train large language models. Many websites and publishers argue that AI firms are profiting from content without compensating the creators, while AI companies contend that publicly available internet data falls under fair use.

In a statement, Reddit Chief Legal Officer Ben Lee emphasized the platform’s support for an open internet but said AI companies need “clear limitations” when it comes to scraping and monetizing content.

Both companies are headquartered in San Francisco, located just a few blocks apart.

The case has been filed under Reddit Inc v Anthropic PBC, California Superior Court, San Francisco County, No. CGC-25-524892.

Jeff Bezos Leads $72M Investment in AI Data Firm Toloka to Fuel U.S. Expansion

Mayıs 11, 2025/in Tech/tarafından ayaksız

Jeff Bezos, through his personal firm Bezos Expeditions, is leading a $72 million funding round in Toloka, an AI data solutions company aiming to scale its global presence, particularly in the United States, Toloka told Reuters on Wednesday.

Toloka specializes in training and evaluating AI models using a global network of human experts and testers, providing high-quality data labeling and validation. The company is part of Nebius Group (NBIS.O), an AI infrastructure firm listed on Nasdaq and formerly affiliated with Russian tech giant Yandex.

The investment marks a significant milestone for CEO and founder Olga Megorskaya, who said the funding would accelerate product development by fostering collaboration between AI agents and human experts.

“There will always be the need for control, verification, and help from human experts to ensure that the result is actually of high quality,” she said.

Strategic Backing and Global Shift

The deal comes after Nebius successfully split from Yandex in a $5.4 billion exit from Russia, the largest corporate withdrawal since the 2022 Ukraine invasion. The restructuring allowed Nebius and Toloka to pursue Western capital without violating sanctions.

Other notable participants in the round include Mikhail Parakhin, CTO of Shopify, who will also serve as Toloka’s executive chairman. Parakhin emphasized the urgent global demand for trusted AI data solutions.

In late 2023, Nvidia invested in a $700 million private placement in Nebius, highlighting growing institutional interest in AI infrastructure and tools.

With this latest funding round:

Bezos Expeditions and other new investors gain equity
Nebius retains a majority economic stake, but gives up majority voting control, enabling Toloka to operate independently
A future funding round is anticipated, Megorskaya said

The investment underscores a broader trend of scaling AI companies focused on high-quality data pipelines, as tech giants like Amazon, Microsoft, and Anthropic increasingly rely on curated training datasets for safe and effective AI model development.

Yazılar

OpenAI Appeals Court Order on Data Preservation in NYT Copyright Lawsuit

Reddit Sues AI Firm Anthropic for Alleged Unauthorized Use of Data

Anthropic Responds

Growing Legal Tensions Over AI Training Data

Jeff Bezos Leads $72M Investment in AI Data Firm Toloka to Fuel U.S. Expansion

Strategic Backing and Global Shift

İlgi çekici linkler

Sayfalar

Kategoriler

Arşiv

Şunun için etiket arşivi: AI training data

Yazılar

Anthropic Responds

Growing Legal Tensions Over AI Training Data

Strategic Backing and Global Shift

İlgi çekici linkler

Sayfalar

Kategoriler

Arşiv