Site iconSite icon ForkLog

DeepMind Unveils AI Model for Fact-Checking

DeepMind Unveils AI Model for Fact-Checking

DeepMind has introduced the AI model SAFE, which checks facts in LLM responses more effectively than humans.

All large language models share a common issue—the reliability of the generated information. Chatbots are prone to hallucinations, which hinder their ability to answer questions accurately. Consequently, each result must be manually verified, significantly increasing the time required to solve a task.

Researchers at DeepMind have developed an AI model that automatically identifies inaccuracies. The system is named Search-Augmented Factuality Evaluator (SAFE).

The developers created an LLM that first separates statements or facts in chatbot responses. It then uses Google Search to find websites that verify the statements and conducts a comparison.

According to the researchers, using the AI model is 20 times cheaper than human fact-checking. As the volume of information generated by chatbots rapidly grows, having a cost-effective verification method will be in demand.

For training, the team used a neural network to check 16,000 facts contained in responses from 13 major language models across four families (Gemini, GPT, Claude, and PaLM-2). They compared the results with conclusions from human fact-checkers and found that SAFE agreed with them in 72% of cases.

When examining disagreements between the AI model and humans, SAFE was correct in 76% of cases.

According to Professor Gary Marcus, it is not entirely accurate to claim that the AI model performs at a “superhuman level,” as the qualification level of the people involved in the experiment is unknown.

The DeepMind team has made the SAFE code available on GitHub.

Back in September 2023, the company’s co-founder Mustafa Suleyman described interactive bots capable of performing tasks for humans as the next stage in AI development.

Exit mobile version