In a recent report by 404 Media, it was revealed that Automattic, the company behind popular platforms like WordPress and Tumblr, is in the process of striking a deal to share data from their sites with OpenAI and Midjourney. This move aims to leverage the vast amount of content generated on these platforms to enhance the capabilities of artificial intelligence. When approached for comment, a representative from Automattic directed attention to a public blog post addressing the matter.
The blog post highlighted that while Automattic’s sites currently block AI crawlers, they plan to allow access to data for AI training purposes in the future, with an option for users to opt-out if desired. However, 404 Media’s investigation uncovered internal messages among Automattic employees discussing the challenges faced in compiling posts dating back to 2014. Mistakes were made, including including content from deleted or suspended blogs, private posts on public platforms, and confidential responses from the “Ask” feature.
One notable aspect mentioned in the report was Tumblr’s shifting stance on explicit content. The platform had implemented a ban on pornography and nudity in 2018, only to relax these restrictions in 2022. This change in policy raised eyebrows and added a layer of complexity to the data-sharing agreement with AI companies. The article suggested delving into 404’s comprehensive coverage to gain a deeper understanding of Automattic’s response to these discrepancies.
The practice of sharing user-generated content with AI entities is not unique to Automattic. Reddit, for instance, has a lucrative deal with Google, while Meta’s internal AI tools heavily rely on data from Facebook and Instagram. However, selling personal content for AI training purposes can spark discomfort among users, especially those who view platforms like Tumblr as a sanctuary for personal expression, art, and creativity.
It’s intriguing to note that even Business Insider, through its parent company, is engaged in a collaboration with OpenAI, utilizing news coverage to train AI models. While this partnership may differ slightly from others due to the professional nature of the content, the concept of user-generated content being utilized for AI training remains a contentious issue. As the boundaries between privacy, innovation, and commercial interests blur, users are left to navigate the evolving landscape of data sharing in the digital age.