Report: Tumblr and WordPress Allegedly Intend to Sell User Data to OpenAI and Midjourney for AI Training

Privacy Concerns Arise: Tumblr and WordPress Data Sharing Allegedly Includes Private Information

Recent revelations suggest that Tumblr and WordPress users may soon discover their data playing a role in training artificial intelligence (AI) models, according to a report circulating within the tech community. Allegedly, Automattic, the parent company of these popular blogging platforms, has entered into agreements with OpenAI and Midjourney to facilitate the sale of user-generated content for AI training purposes. Although the specifics of these agreements and data-sharing protocols remain shrouded in uncertainty, concerns regarding data privacy and the ethical implications of such partnerships have come to the forefront.

Internal communications obtained by 404 Media shed light on these purported deals, confirming Automattic’s involvement in collaborating with AI firms and shedding insight into the nature of these arrangements. According to the report, an official announcement regarding Automattic’s partnership with OpenAI and Midjourney is imminent, with indications suggesting that data collection efforts for AI training have already commenced. Notably, an internal communication authored by product manager Cyle Gage hints at the comprehensive compilation of public post content from Tumblr spanning the years 2014 to 2023, underscoring the magnitude of data potentially involved.

Of particular concern is the report’s disclosure of private and deleted user content being inadvertently included in the data compilation process, alongside publicly accessible information. The ramifications of such an oversight raise critical questions regarding data security protocols and the safeguarding of user privacy. Given the potential exposure of sensitive user information, including private data, to third-party AI firms, scrutiny surrounding the ethical standards and data protection measures employed by Automattic is warranted. This development serves as a stark reminder of the imperative for companies to prioritize transparent data practices and robust privacy safeguards in an increasingly interconnected digital landscape.

Automattic on Tuesday issued a statement stating, “AI is rapidly transforming nearly every aspect of our world, including the way we create and consume content. At Automattic, we’ve always believed in a free and open web and individual choice. Like other tech companies, we’re closely following these advancements, including how to work with AI companies in a way that respects our users’ preferences.”

 

 

The post detailed several things the company is doing for its users including blocking AI platform crawlers, a setting to discourage search engines from indexing a site on WordPress and Tumblr, and an assurance of an opt-out setting for users who do not wish to share data with the third party. “Currently, no law exists that requires crawlers to follow these preferences,” the post stated.

The mechanism to opt-out of data sharing is also somewhat unclear. While the company stated in the post that the AI firms will respect the opt-out settings and even remove the past content from users who have newly opted out, the report claims the reality is more complicated.

The response was noted to be vague and does not confirm if Automattic had an agreement on the same, according to the report. Further, it appears that the entire line of reasoning holds on the assumption that AI firms will not gain much by retaining the user data. It should be noted that the practice of third-party data sharing is not new, and most social media platforms hold the rights to user-generated public content on the platform. However, making such deals without revealing it to users could potentially expose private information to companies that are using the same data to train AI systems.