AI training data arşivleri

Salesforce faces lawsuit from authors over AI model training data

Ekim 19, 2025/in Tech/tarafından ayaksız

Salesforce (CRM) is facing a proposed class action lawsuit accusing it of using copyrighted books without permission to train its xGen artificial intelligence models. The complaint, filed Wednesday in a U.S. court, was brought by authors Molly Tanzer and Jennifer Gilmore, who allege that the cloud-computing firm infringed their copyrights by using their works to develop language-processing AI.

The lawsuit claims Salesforce used “thousands of pirated books” written by the plaintiffs and other authors to train its AI systems, echoing similar suits filed against other tech giants like OpenAI, Microsoft, and Meta over the use of copyrighted material in AI training datasets.

“It’s important that companies that use copyrighted material for AI products are transparent,” said Joseph Saveri, the authors’ attorney, who has led several high-profile copyright cases against AI companies. “Our clients deserve fair compensation when their creative work is used.”

Salesforce has declined to comment on the lawsuit.

In an ironic twist, the complaint notes that Salesforce CEO Marc Benioff has previously criticized other AI firms for using “stolen” training data, arguing that compensating creators would be “very easy to do.” The lawsuit quotes that statement, suggesting Salesforce failed to follow its own advice.

The case adds to a growing list of legal battles testing how intellectual property laws apply in the age of AI model training, with potentially wide-ranging implications for the industry.

Cloudflare Introduces Pay-Per-Crawl Tool to Help Websites Monetize AI Bot Access

Temmuz 12, 2025/in Tech/tarafından ayaksız

Cloudflare has unveiled a new tool designed to give website owners greater control over AI bot crawlers accessing their content, allowing them to block unauthorized scraping or set fees for access. The move aims to help publishers and content creators monetize the use of their material by artificial intelligence companies, which increasingly crawl websites to train AI models without sending traffic back or providing compensation.

The tool enables site owners to choose which AI crawlers can access their content and implement a “pay per crawl” pricing model, helping creators control how their work is used and ensure fair payment. This innovation comes amid declining referral traffic from search engines, which historically drove ad revenue to websites.

Major publishers like Condé Nast, the Associated Press, and social platforms including Reddit and Pinterest back the initiative. Cloudflare’s Chief Strategy Officer, Stephanie Cohen, explained that the tool is designed to establish a sustainable ecosystem for content creators and AI companies alike. She highlighted that rapid changes in traffic patterns demand new approaches, calling this tool “the beginning of a new model for the internet.”

Data from Cloudflare shows that Google’s ratio of crawls to visitor referrals has dropped from 6:1 to 18:1 in six months, suggesting users increasingly get answers directly from Google search results or AI features rather than visiting original sites. However, Google’s crawl-to-visit ratio remains far lower than AI firms like OpenAI, which have ratios around 1,500:1, reflecting heavy content scraping without referral traffic.

For decades, traditional search engines indexed web content and drove users to publishers, rewarding them for their work. But AI crawlers disrupt this model by harvesting data without sending visitors back, aggregating content in chatbots like ChatGPT, and reducing creators’ revenue and recognition.

Many AI companies bypass common publisher tools used to block scraping and argue their data collection is legal and fair use. This has led some publishers, including the New York Times, to sue AI firms for copyright infringement. Others have negotiated licensing agreements to protect their content and monetize usage.

Reddit, notably, has sued AI startup Anthropic for scraping user comments but also signed a licensing deal with Google, illustrating the complex responses from content owners seeking to protect their assets in the AI era.

OpenAI to Continue Collaboration with Scale AI Despite Meta’s Major Stake Purchase

Haziran 15, 2025/in Tech/tarafından ayaksız

OpenAI confirmed it will maintain its partnership with Scale AI after Meta agreed to acquire a 49% stake in the AI data-labeling startup for $14.8 billion, OpenAI CFO Sarah Friar said at the VivaTech conference in Paris.

Scale AI is vital for providing the vast volumes of labeled training data essential for advanced AI tools like OpenAI’s ChatGPT. Despite Meta’s significant investment, OpenAI emphasized it intends to keep working with multiple data vendors rather than exclusively relying on Scale.

Friar highlighted the importance of keeping the AI ecosystem open, cautioning against moves that could slow innovation by locking out competitors. “We don’t want to ice the ecosystem because acquisitions are going to happen,” she said.

Meta’s stake comes as OpenAI’s ChatGPT competes directly with Meta’s Llama AI models. Scale AI’s CEO Alexandr Wang will now lead Meta’s new superintelligence unit, underscoring the startup’s growing influence in the AI space.

Friar also noted the increasing complexity of AI models requires input from a diverse network of human trainers with deep expertise—from academics to scientists—reflecting the growing sophistication in AI development.

Yazılar

Salesforce faces lawsuit from authors over AI model training data

Cloudflare Introduces Pay-Per-Crawl Tool to Help Websites Monetize AI Bot Access

OpenAI to Continue Collaboration with Scale AI Despite Meta’s Major Stake Purchase

İlgi çekici linkler

Sayfalar

Kategoriler

Arşiv

Şunun için etiket arşivi: AI training data

Yazılar

İlgi çekici linkler

Sayfalar

Kategoriler

Arşiv