Meta, formerly known as Facebook, is diving headfirst into the world of generative AI, and the company is mulling over whether it needs to pony up for better training data. The tech giant is eyeing the news industry, pondering paid deals with news publishers to gain better and more immediate access to content. This move could potentially elevate its generative AI tools, such as Meta AI, making them more user-friendly and competitive in the crowded market of AI search tools and chatbots.
Teams within Meta are discussing the merits of entering into these paid agreements, according to insider sources. The idea is that by securing deeper access to news, photo, and video content, Meta AI could offer more accurate and timely responses. As of now, Meta has not formally approached any news outlets about such licensing agreements. Should they decide to proceed, these deals would be distinct from previous arrangements, where Meta paid publishers simply to host links to their content on its platforms.
The past 18 months have seen Meta drastically alter its relationship with the news industry. Last year, the company scrapped a $2 billion budget for its News division. Meta CEO Mark Zuckerberg has previously stated that the company possesses its own data for training its Llama large language model, which he claims is more extensive than the widely-used Common Crawl dataset. However, relying solely on their own data could put Meta at a disadvantage compared to rivals like Google and OpenAI.
Generative AI burst into the public consciousness almost two years ago with the launch of ChatGPT. Since then, news outlets and other websites have begun to block automated bots deployed by Common Crawl and OpenAI from scraping their content for free. The US Copyright Office is even considering new rules to regulate generative AI. Without continuous free access to news publisher content, Meta AI’s responses to user prompts about current events could become limited, outdated, or incorrect.
Other tech behemoths have already secured deals with news publishers for enhanced access to content for model training. Most news publishers are open to licensing deals, preferring some form of compensation over none. Meta finds itself at a crossroads: it could either stay the course with its current data or invest in higher-quality, up-to-date training data from news publishers. The latter option could help them remain competitive in the rapidly evolving generative AI landscape.
In essence, Meta is in a high-stakes game of catch-up, trying not to fall behind its rivals. As the AI race heats up, the quality of data becomes increasingly crucial. Whether Meta chooses to dig deep into its pockets could very well determine its future standing in the AI arena. Either way, the decisions made in Menlo Park will likely reverberate across both the tech and news industries.