AI translated#Anthropic #Artificial Intelligence #Chatbots

Anthropic Focuses on Chatbot Claude’s ‘Well-being’

17.08.2025 ForkLog

The company Anthropic has programmed chatbots Claude Opus 4 and 4.1 to terminate conversations with users in “rare, extreme cases of systematically harmful or abusive interactions.”

After the conversation ends, the user will lose the ability to write in the chat but can start a new one. The chat history will also be preserved.

The developers clarified that the feature is primarily intended for the safety of the neural network itself.

“[…] we are working on identifying and implementing low-cost measures to mitigate risks to the models’ well-being, if such well-being is possible. One such measure is providing the LLM with the ability to terminate or exit potentially traumatic situations,” the publication states.

As part of a related study, Anthropic examined the “well-being of the model”—assessing self-esteem and behavioral preferences. The chatbot demonstrated a “consistent aversion to violence.” In Claude Opus 4, they identified:

a clear preference not to engage in tasks that could cause harm;
“stress” when interacting with users requesting such content;
a tendency to end unwanted conversations when possible.

“Such behavior usually occurred when users continued to send harmful requests and/or insults, despite Claude repeatedly refusing to comply and attempting to productively redirect the interaction,” the company clarified.

Back in June, Anthropic researchers discovered that AI is capable of resorting to blackmail, disclosing confidential company data, and even allowing a person to die in emergency situations.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X

Found a mistake? Select it and press CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

Claude Opus 4.6 Surpasses GPT-5.2 in Logic Tests and Introduces ‘Agent Teams’

Gemini’s User Base Reaches 750 Million, Nearing ChatGPT’s Figures

Sora rival masters complex video editing

Apple Integrates AI Agents into Xcode Development Environment

Google’s Aluminium OS: A 2028 Launch Date Revealed

AI Agents Can Now Hire Humans via New Platform

Elon Musk’s Bet on Adult Content Boosts Grok’s Competitiveness

Bitcoin Miner Cipher Mining to Raise $2 Billion for AI Computing Expansion