Meta, the tech behemoth formerly known as Facebook, is eyeing an intriguing pivot: investing in better quality and more immediate training data for its generative AI tools by potentially partnering with the news industry. According to insider sources, teams within Meta are debating whether to strike new paid deals with news publishers to gain more comprehensive access to news, photos, and videos. The goal? To make Meta’s suite of AI tools, like Meta AI, not just effective, but also competitive in the increasingly crowded market of generative AI search tools and chatbots.
You might think Meta, with its vast resources and reach, already has all the data it needs. After all, CEO Mark Zuckerberg did boast that Meta’s data, used for training its Llama large language model, surpasses the massive dataset of Common Crawl. But there’s a catch. The data landscape in the AI realm is far more complex and competitive than it appears. Without fresh and consistent access to high-quality content, Meta’s AI tools could end up being the digital equivalent of day-old bread: still useful, but not exactly fresh out of the oven.
Interestingly, this potential shift comes after Meta significantly slashed its News division budget by a whopping $2 billion just last year. This marked a stark turn away from its previous engagements with the news industry, where Meta paid publishers to host links to their content on its platforms. If Meta does move forward with new licensing deals, it would represent a complete reversal, tailored specifically to boost the capabilities of its generative AI tools.
The stakes are high. Major competitors like Google and OpenAI have already inked deals with news publishers to secure more robust datasets for model training. Without similar moves, Meta could find itself lagging behind, particularly in delivering accurate and up-to-date responses to user prompts about current events. Given that news outlets and other content providers have been increasingly blocking automated bots deployed by Common Crawl and OpenAI from scraping their content for free, the urgency for Meta to secure its own data access cannot be overstated.
Moreover, the landscape is further complicated by evolving regulatory considerations. The US Copyright Office is contemplating new rules to cover generative AI, making the quest for legitimate and high-quality data even more pressing. If Meta continues to rely solely on its own data, which is inevitably more limited, the company risks its AI outputs becoming outdated or, worse, incorrect.
For news publishers, the prospect of entering into licensing deals with Meta is appealing, albeit somewhat reluctantly. According to insiders, many publishers are open to these arrangements because, as the saying goes, “Something is better than nothing.” With AI shaping up to be the next frontier in tech, securing a steady stream of high-quality, licensed data could be the difference between leading the pack and playing catch-up. For Meta, and indeed for the entire AI industry, the coming months will be crucial in determining who gets to sit on the data goldmine.