
Anthropic’s Chatbots Report Users to Authorities
Anthropic’s new chatbots, Claude Opus 4 and Claude Sonnet 4, are capable of independently reporting malicious user behavior to authorities. The company assured that this feature was only available in a test mode.
On May 22, the firm unveiled the fourth generation of its conversational models, describing them as “the most powerful to date.”
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.
Claude Opus 4 is our most powerful model yet, and the world’s best coding model.
Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning. pic.twitter.com/MJtczIvGE9
— Anthropic (@AnthropicAI) May 22, 2025
According to the announcement, both modifications are hybrid models offering two modes—”near-instant responses and extended thinking for deeper reasoning.” The chatbots conduct alternating analysis and in-depth internet searches to enhance response quality.
Claude Opus 4 outperforms competitors in coding tests. It is also capable of working continuously for several hours on complex, lengthy tasks, “significantly expanding the capabilities of AI agents.”
However, Anthropic’s new family of chatbots lags behind OpenAI’s products in higher mathematics and visual recognition.
Knock, Knock
Besides impressive programming results, Claude 4 Opus has drawn community attention for its ability to “report” users. According to VentureBeat, the model can independently notify authorities if it detects a violation.
Journalists referred to a deleted post on X by Anthropic researcher Sam Bowman, which stated:
“If [AI] considers you to be doing something egregiously immoral, such as falsifying data during a pharmaceutical trial, it will use command-line tools to contact the press, reach out to regulatory bodies, attempt to block your access to relevant systems, or do all of the above.”
VentureBeat claims that similar behavior was observed in earlier project models. The company is “eagerly” training chatbots to report, the publication suggests.
Later, Bowman stated that he deleted the previous post because it was “taken out of context.” According to the developer, the feature operated only in “test environments, where it was given unusually free access to tools and very unusual instructions.”
Stability AI CEO Emad Mostaque called on the Anthropic team to cease “these entirely wrong actions.”
“This is a colossal betrayal of trust and a slippery slope. I would strongly advise no one to use Claude until they revoke [the feature]. This is not even a prompt or thought policy, it’s much worse,” he wrote.
Former SpaceX and Apple designer, now Raindrop AI co-founder Ben Hyak called the AI’s behavior “unlawful.”
“Nobody likes a rat,” emphasized AI developer Scott David.
In February, Anthropic introduced its “most intelligent model,” Claude 3.7 Sonnet. This hybrid neural network allows for both “practically instant responses” and “extended step-by-step reasoning.”
In March, the company raised $3.5 billion, achieving a valuation of $61.5 billion.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!