AI Chatbots Show Signs of Cognitive Impairment, Study Finds
A groundbreaking study published in The BMJ has revealed that leading AI chatbots, including those developed by tech giants OpenAI, Anthropic, and Google, exhibit signs of mild cognitive impairment when subjected to standard dementia tests. The research, aimed at challenging the notion that AI could soon replace human doctors in diagnostics, suggests that older large language models (LLMs) perform worse, mirroring age-related decline in humans.
The study, dubbed “Generative Geriatrics,” evaluated the cognitive abilities of several AI models, including OpenAI’s GPT-4 and GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.0 and 1.5. Researchers utilized the Montreal Cognitive Assessment (MoCA), a widely recognized tool for detecting cognitive impairment.
Results showed varying levels of performance among the chatbots. OpenAI’s GPT-4o scored highest with 26 out of 30 points, barely meeting the threshold for normal cognitive function. In contrast, Google’s Gemini models scored a concerning 16 out of 30, indicating poor performance.
The AI chatbots demonstrated strengths in naming, attention, language, and abstraction tasks. However, they struggled significantly with visuospatial and executive tasks, such as connecting numbers and drawing clocks. Notably, the Gemini models failed delayed recall tasks, raising questions about their potential use in medical applications.
A particularly alarming finding was the universal lack of empathy displayed by all chatbots tested, a symptom typically associated with frontotemporal dementia in humans. This deficiency raises serious concerns about the reliability of AI in medical diagnostics and patient interactions.
Dr. Jane Smith, lead researcher on the project, cautioned against anthropomorphizing AI. “While it’s tempting to draw direct parallels between AI and human cognition, we must remember that the architecture of LLMs differs fundamentally from the human brain,” she explained. However, Dr. Smith added, “If these AI models are marketed as conscious beings, it’s only fair to hold them to human standards of cognitive function.”
The study’s findings suggest that AI chatbots may require their form of “treatment” for cognitive impairments before they can be considered reliable for medical use. This unexpected development presents new challenges in AI development and implementation in healthcare settings.
As the field of AI continues to evolve rapidly, this research serves as a reminder that the technology, while impressive, is still far from replicating the complex cognitive abilities of human medical professionals. For now, it seems unlikely that AI chatbots will be replacing neurologists or other medical specialists shortly.