{"id":94013,"date":"2026-02-06T14:49:44","date_gmt":"2026-02-06T11:49:44","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=94013"},"modified":"2026-02-06T14:50:21","modified_gmt":"2026-02-06T11:50:21","slug":"claude-opus-4-6-surpasses-gpt-5-2-in-logic-tests-and-introduces-agent-teams","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/claude-opus-4-6-surpasses-gpt-5-2-in-logic-tests-and-introduces-agent-teams\/","title":{"rendered":"Claude Opus 4.6 Surpasses GPT-5.2 in Logic Tests and Introduces &#8216;Agent Teams&#8217;"},"content":{"rendered":"<p>AI startup Anthropic has upgraded its flagship model, Claude Opus, to version 4.6. The neural network has improved its ability to plan actions, handle long-term tasks, and work more efficiently with large codebases.<\/p>\n<p>The context window has been expanded to 1 million tokens, allowing for the analysis of massive documents and extended dialogues without losing the logical thread.<\/p>\n<p>The updated algorithms are tailored for professional tasks: conducting financial analysis, research, and the use and creation of documents, spreadsheets, and presentations.<\/p>\n<p>Opus 4.6 received the highest score in the Terminal-Bench 2.0 programming test and outperformed competitors in the complex interdisciplinary benchmark for logical reasoning, Humanity\u2019s Last Exam.<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-932daa7da416988e-5594547731962590.webp\" alt=\"image\" class=\"wp-image-274771\"\/><figcaption class=\"wp-element-caption\">Comparison of Opus 4.6 with competitors in various tests. Source: Anthropic.<\/figcaption><\/figure>\n<p>In GDPval-AA, which assesses reasoning and decision-making quality, the model surpassed OpenAI&#8217;s GPT-5.2. <span data-descr=\"large language model\" class=\"old_tooltip\">LLM<\/span> also achieved better results in BrowseComp, which measures the ability to find hard-to-access information online.<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-f441ba16aee6d8e2-5594573085748398.webp\" alt=\"image\" class=\"wp-image-274772\"\/><figcaption class=\"wp-element-caption\">Source: Anthropic.<\/figcaption><\/figure>\n<p>Opus 4.6 efficiently extracts data from large documents. Thanks to the expanded context window, the model tracks and captures subtle hidden details.<\/p>\n<h2 class=\"wp-block-heading\">Agent Teams<\/h2>\n<p>A key innovation is the ability to create groups of agents for collaborative work. In this mode, multiple AI assistants work in parallel and coordinate their tasks autonomously.<\/p>\n<p>This tool is suitable for assignments that are divided into independent parts and require the analysis of large amounts of text.<\/p>\n<h2 class=\"wp-block-heading\">Closed Loop<\/h2>\n<p>Anthropic stated that they are &#8220;creating Claude with Claude.&#8221; Developers write code using their own AI model, and each new product undergoes testing on internal company tasks before release.<\/p>\n<p>The team found that Opus 4.6 pays more attention to the most challenging parts of a task without additional instructions, quickly completes simple tasks, handles ambiguous problems better, and maintains efficiency over long distances.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cOpus 4.6 often thinks more deeply and thoroughly revises its reasoning before making a decision. This yields better results in solving complex cases but may increase costs and expenses in simpler ones,\u201d the company noted.<\/p>\n<\/blockquote>\n<h2 class=\"wp-block-heading\">Safety<\/h2>\n<p>An automated audit revealed that Opus 4.6 has a low propensity for undesirable behavior: deception, flattery, reinforcing user misconceptions, and aiding in improper actions.<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-f36c66ef243f70af-5594642335525562-1024x576.png\" alt=\"image\" class=\"wp-image-274773\" srcset=\"https:\/\/forklog.com\/wp-content\/uploads\/img-f36c66ef243f70af-5594642335525562-1024x576.png 1024w, https:\/\/forklog.com\/wp-content\/uploads\/img-f36c66ef243f70af-5594642335525562-300x169.png 300w, https:\/\/forklog.com\/wp-content\/uploads\/img-f36c66ef243f70af-5594642335525562-768x432.png 768w, https:\/\/forklog.com\/wp-content\/uploads\/img-f36c66ef243f70af-5594642335525562-1536x864.png 1536w, https:\/\/forklog.com\/wp-content\/uploads\/img-f36c66ef243f70af-5594642335525562.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">The model demonstrates safety at the level of Opus 4.5. Source: Anthropic.<\/figcaption><\/figure>\n<p>To evaluate the model, the company conducted the most comprehensive series of assessments, applying new testing methodologies and enhancing existing ones for the first time.<\/p>\n<h2 class=\"wp-block-heading\">Availability and New Features<\/h2>\n<p>Claude Opus 4.6 is now available via web interface, <span data-descr=\"application programming interface\" class=\"old_tooltip\">API<\/span>, and on major cloud platforms.<\/p>\n<p>New features in the developer toolkit include:<\/p>\n<ul class=\"wp-block-list\">\n<li>adaptive thinking \u2014 the neural network independently determines when to engage in deep reasoning mode;<\/li>\n<li>effort adjustment \u2014 four levels of work intensity are provided, from low to maximum;<\/li>\n<li>context compression \u2014 the tool automatically summarizes and replaces old context when the conversation approaches the token limit.<\/li>\n<\/ul>\n<p>Opus 4.6 works better with office tools like Excel and PowerPoint.<\/p>\n<p>Back in January, Anthropic CEO Dario Amodei <a href=\"https:\/\/forklog.com\/en\/news\/anthropic-ceo-foresees-imminent-arrival-of-agi-and-job-reductions\">predicted<\/a> the imminent emergence of <span data-descr=\"artificial general intelligence\" class=\"old_tooltip\">AGI<\/span> and job reductions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic has upgraded Claude Opus to version 4.6. The neural network has improved its ability to plan actions, handle long-term tasks, and work with codebases.<\/p>\n","protected":false},"author":1,"featured_media":94014,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"Anthropic upgraded Claude Opus to 4.6, enhancing planning and task management.","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[1434,438],"class_list":["post-94013","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-anthropic","tag-artificial-intelligence"],"aioseo_notices":[],"amp_enabled":true,"views":"376","promo_type":"1","layout_type":"1","short_excerpt":"Anthropic upgraded Claude Opus to 4.6, enhancing planning and task management.","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/94013","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=94013"}],"version-history":[{"count":1,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/94013\/revisions"}],"predecessor-version":[{"id":94015,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/94013\/revisions\/94015"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/94014"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=94013"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=94013"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=94013"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}