Anthropic Warns of AI Self-Improvement Risks

ForkLog

5 minutes ago

Anthropic Warns of AI Self-Improvement Risks

Members of the Anthropic team are increasingly delegating a significant portion of new model development to AI systems. The company identified this as a sign of approaching recursive self-improvement.

According to internal data, more than 80% of the code for the company’s current products was written by Claude. In the second quarter, the amount of code per engineer increased eightfold compared to 2024.

Marina Favaro, head of the Anthropic Institute, and company co-founder Jack Clark wrote that with sufficient computing power, this trend could lead to a system capable of “designing and developing its successor entirely autonomously.”

“We have not yet reached the point of no return, and recursive self-improvement is not inevitable. But it may occur sooner than most institutions are prepared for,” the experts emphasized.

Benchmarks and Metrics

In April, Claude completed over 800 corrections — according to the supervising engineer, it would have taken a human four years to accomplish this.

In open tasks, the share of successful sessions by Claude increased to 76% in May 2026 — a 50 percentage point rise over six months.

Anthropic stated that the duration of tasks that AI can reliably perform independently doubles approximately every four months (compared to the previous seven).

In a task to accelerate the training of a small AI model, Claude Opus 4 in May 2025 provided an average speed increase of about three times, while Mythos Preview in April 2026 achieved approximately a 52-fold increase.

During internal tests, the Mythos Preview model demonstrated the ability to solve research tasks in AI safety. Over 800 hours, a group of agents closed 97% of the problem gap in an experiment, while two human researchers managed only 23% of the volume in a week.

New Bottlenecks

Despite successes in code writing, humans still hold an advantage in “research judgment” and setting strategic goals.

Anthropic believes that in the near future, the role of developers will shift from writing lines of code to deeply reviewing the results of neural network work. Human verification may become the main bottleneck in the speed of developing new models.

The company also suggested that it might be beneficial for the world to have the ability to slow down or temporarily halt the development of advanced AI systems, allowing societal institutions and alignment research to keep pace with progress.

Concurrently, startup representatives warned that unilateral slowing could backfire on those who decelerate — less cautious players could close the gap. Without a global coordination mechanism, safety decisions will have to be made under competitive and geopolitical pressure.

In May, Anthropic published its first report on Project Glasswing — a program for identifying vulnerabilities using the Claude Mythos model.

In the same month, the company released Claude Opus 4.8 and separately introduced a dynamic workflow feature for Claude Code.