AI translated#Anthropic #Artificial Intelligence

Anthropic Researchers Warn of Potential AI Sabotage

19.10.2024 ForkLog

Artificial intelligence might one day sabotage humanity, though for now, all is well. This was reported by experts from the AI startup Anthropic in a new study.

New Anthropic research: Sabotage evaluations for frontier models

How well could AI models mislead us, or secretly sabotage tasks, if they were trying to?

Read our paper and blog post here: https://t.co/nQrvnhrBEv pic.twitter.com/GWrIr3wQVH

— Anthropic (@AnthropicAI) October 18, 2024

Specialists examined four different threat vectors from artificial intelligence and determined that “minimal mitigation measures” were sufficient for current models.

“Sufficiently capable models could undermine human oversight and decision-making in critical contexts. For instance, in the context of AI development, models might secretly sabotage efforts to assess their own dangerous capabilities, monitor their behavior, or make deployment decisions,” the document states.

However, the good news is that Anthropic researchers see possibilities for mitigating such risks, at least for now.

“While our demonstrations showed that current models might have low-level signs of sabotage capability, we believe that minimal mitigation measures are sufficient to eliminate risks. Nonetheless, as AI capabilities improve, more realistic and stringent risk reduction measures will likely be necessary,” the report states.

Previously, experts hacked AI robots and forced them to perform actions prohibited by security protocols and ethical standards, such as detonating bombs.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X

Found a mistake? Select it and press CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

Buterin Criticizes Autonomous AI and Reveals Details of Major Ethereum Updates

Consciousness Is an Atavism

Google Unveils Enhanced Gemini 3.1 Pro

Google Unveils Lyria 3 AI Model for Music Generation

OpenAI Unveils Benchmark for AI Agents’ Ability to Hack Smart Contracts

Opinion: Fusion could solve AI’s looming energy shortfall

The big, bad AI that isn’t

Anthropic Unveils Claude Sonnet 4.6 with a Million-Token Context Window