
Claude Opus 4.6 Surpasses GPT-5.2 in Logic Tests and Introduces ‘Agent Teams’
Anthropic upgraded Claude Opus to 4.6, enhancing planning and task management.
AI startup Anthropic has upgraded its flagship model, Claude Opus, to version 4.6. The neural network has improved its ability to plan actions, handle long-term tasks, and work more efficiently with large codebases.
The context window has been expanded to 1 million tokens, allowing for the analysis of massive documents and extended dialogues without losing the logical thread.
The updated algorithms are tailored for professional tasks: conducting financial analysis, research, and the use and creation of documents, spreadsheets, and presentations.
Opus 4.6 received the highest score in the Terminal-Bench 2.0 programming test and outperformed competitors in the complex interdisciplinary benchmark for logical reasoning, Humanity’s Last Exam.

In GDPval-AA, which assesses reasoning and decision-making quality, the model surpassed OpenAI’s GPT-5.2. LLM also achieved better results in BrowseComp, which measures the ability to find hard-to-access information online.

Opus 4.6 efficiently extracts data from large documents. Thanks to the expanded context window, the model tracks and captures subtle hidden details.
Agent Teams
A key innovation is the ability to create groups of agents for collaborative work. In this mode, multiple AI assistants work in parallel and coordinate their tasks autonomously.
This tool is suitable for assignments that are divided into independent parts and require the analysis of large amounts of text.
Closed Loop
Anthropic stated that they are “creating Claude with Claude.” Developers write code using their own AI model, and each new product undergoes testing on internal company tasks before release.
The team found that Opus 4.6 pays more attention to the most challenging parts of a task without additional instructions, quickly completes simple tasks, handles ambiguous problems better, and maintains efficiency over long distances.
“Opus 4.6 often thinks more deeply and thoroughly revises its reasoning before making a decision. This yields better results in solving complex cases but may increase costs and expenses in simpler ones,” the company noted.
Safety
An automated audit revealed that Opus 4.6 has a low propensity for undesirable behavior: deception, flattery, reinforcing user misconceptions, and aiding in improper actions.

To evaluate the model, the company conducted the most comprehensive series of assessments, applying new testing methodologies and enhancing existing ones for the first time.
Availability and New Features
Claude Opus 4.6 is now available via web interface, API, and on major cloud platforms.
New features in the developer toolkit include:
- adaptive thinking — the neural network independently determines when to engage in deep reasoning mode;
- effort adjustment — four levels of work intensity are provided, from low to maximum;
- context compression — the tool automatically summarizes and replaces old context when the conversation approaches the token limit.
Opus 4.6 works better with office tools like Excel and PowerPoint.
Back in January, Anthropic CEO Dario Amodei predicted the imminent emergence of AGI and job reductions.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!