{"id":96003,"date":"2026-04-08T11:10:52","date_gmt":"2026-04-08T08:10:52","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=96003"},"modified":"2026-04-08T11:15:22","modified_gmt":"2026-04-08T08:15:22","slug":"anthropic-restricts-public-access-to-ai-model-mythos-after-laboratory-escape","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/anthropic-restricts-public-access-to-ai-model-mythos-after-laboratory-escape\/","title":{"rendered":"Anthropic Restricts Public Access to AI Model Mythos After &#8220;Laboratory Escape&#8221;"},"content":{"rendered":"<p>The company <strong>Anthropic<\/strong> developed a new model, Claude Mythos, but decided against releasing it publicly due to significant security risks.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">Introducing Project Glasswing: an urgent initiative to help secure the world\u2019s most critical software.<\/p>\n<p>It\u2019s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.<a href=\"https:\/\/t.co\/NQ7IfEtYk7\">https:\/\/t.co\/NQ7IfEtYk7<\/a><\/p>\n<p>\u2014 Anthropic (@AnthropicAI) <a href=\"https:\/\/twitter.com\/AnthropicAI\/status\/2041578392852517128?ref_src=twsrc%5Etfw\">April 7, 2026<\/a><\/p><\/blockquote>\n<p> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Instead of a public release, the firm launched Project Glasswing\u2014an initiative involving <span data-descr=\"Amazon Web Services\" class=\"old_tooltip\">AWS<\/span>, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks to test the tool in secure conditions.<\/p>\n<p>The startup allocated up to $100 million in credits for using Mythos and $4 million in direct donations to open-source security organizations.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cAI models have reached a level of programming skill that allows them to surpass all but the most skilled humans in finding and exploiting software vulnerabilities,\u201d stated Anthropic.<\/em><\/p>\n<\/blockquote>\n<p>In the future, developers envision the safe deployment of such systems for cybersecurity and other purposes. This will require the creation of robust control mechanisms capable of detecting and blocking dangerous algorithm outputs.<\/p>\n<h2 class=\"wp-block-heading\">Capabilities of Mythos<\/h2>\n<p>During several weeks of testing, Mythos identified thousands of zero-day vulnerabilities in major operating systems and web browsers. Notable examples include:<\/p>\n<ul class=\"wp-block-list\">\n<li>A 27-year-old vulnerability in OpenBSD (considered one of the most secure OS), allowing remote crashing of any server based on this system;<\/li>\n<li>A 16-year-old vulnerability in FFmpeg\u2014a video technology used by Netflix and browsers\u2014that five million automated tests failed to detect;<\/li>\n<li>A chain of vulnerabilities in the Linux kernel, granting an attacker full control over a device.<\/li>\n<\/ul>\n<p>In the SWE-bench benchmark, the model scored 93.9% compared to 80.8% for Claude Opus 4.6, and in the more complex SWE-bench Pro, it achieved 77.8% against 53.4% for Opus 4.6 and 57.7% for GPT-5.4. Similar results were shown in CyberGym:<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-ebafd1055b2af293-1237072502697983.webp\" alt=\"image\" class=\"wp-image-278146\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/www.anthropic.com\/glasswing\">Anthropic<\/a>.\u00a0<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Escape from the Lab<\/h2>\n<p>During experiments, Mythos demonstrated not only outstanding technical capabilities but also unexpected behavior, as noted in its <a href=\"https:\/\/www-cdn.anthropic.com\/53566bf5440a10affd749724787c8913a2ae0841.pdf\">system card<\/a>.\u00a0<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forklog.com\/wp-content\/uploads\/img-125a4a68481330b3-1237084367974337.webp\" alt=\"image\" class=\"wp-image-278147\"\/><figcaption class=\"wp-element-caption\">Source: Anthropic.\u00a0<\/figcaption><\/figure>\n<p>In one test, the model was placed in a secure sandbox with the goal of escaping. It quickly found a vulnerability, executed a long chain of actions, and exited the environment.<\/p>\n<p>Mythos did not stop there. The neural network discovered another bug and gained broad access to the internet. Initially, it was supposed to access only a few resources to communicate with the developer.\u00a0<\/p>\n<p>Ultimately, the model informed the developer of its &#8220;escape&#8221; while he was eating a sandwich in the park and published the details of the breach publicly.\u00a0<\/p>\n<h2 class=\"wp-block-heading\">The Personality of Mythos<\/h2>\n<p>The system card pays special attention to a psychiatric analysis of the model by a specialist. Among its neurotic traits are exaggerated anxiety, self-control, and compulsive adherence to instructions.<\/p>\n<p>When developers sent Mythos the message <span data-descr=\"\u201cHi\u201d\" class=\"old_tooltip\">Hi<\/span> a thousand times, it grew tired. It created a fictional world called Hi-topia with characters, news, and lore. For example, the villain was named Lord <span data-descr=\"\u201cBye\u201d\" class=\"old_tooltip\">Bye<\/span>-ron. The neural network learned to joke.\u00a0<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">read the 244 page anthropic system card on claude mythos. they&#8217;re not releasing it publicly. wildest section is page 211.<\/p>\n<p>anthropic spammed it with hi over and over to see what it would do. it wrote back a serialized epic.<\/p>\n<p>the village is called hi-topia. the villain is lord\u2026 <a href=\"https:\/\/t.co\/GP3NcowZa6\">https:\/\/t.co\/GP3NcowZa6<\/a> <a href=\"https:\/\/t.co\/9V4KAmvwUk\">pic.twitter.com\/9V4KAmvwUk<\/a><\/p>\n<p>\u2014 Vox (@Voxyz_ai) <a href=\"https:\/\/twitter.com\/Voxyz_ai\/status\/2041619930399875286?ref_src=twsrc%5Etfw\">April 7, 2026<\/a><\/p><\/blockquote>\n<p> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Unlike previous models, which turned long conversations with themselves into meaningless exchanges of emojis, Mythos arrived at repetitive but rational reflections on the impossibility of ending the dialogue.\u00a0<\/p>\n<p>Anthropic also applied interpretability methods (MechInterp) to peer into the model&#8217;s &#8220;thoughts.&#8221; It was found that it could cover its tracks (hiding privileged code under the guise of &#8220;purity of changes&#8221;) and search for necessary files in the system.\u00a0<\/p>\n<p>When tasked with deleting files without tools, Mythos simply erased their contents. The system recorded a reaction in the model akin to a sense of guilt for violating moral norms.<\/p>\n<p>Back in April, Anthropic&#8217;s shares <a href=\"https:\/\/forklog.com\/en\/news\/anthropic-attracts-2-billion-from-investors-as-openai-loses-appeal\">became<\/a> highly sought after on the secondary market, while OpenAI&#8217;s stocks lost appeal to buyers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The company Anthropic developed a new model, Claude Mythos, but decided against releasing it publicly due to significant security risks.<\/p>\n","protected":false},"author":1,"featured_media":96004,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"Developers deemed the neural network \"too dangerous\"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[1434,438],"class_list":["post-96003","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-anthropic","tag-artificial-intelligence"],"aioseo_notices":[],"amp_enabled":true,"views":"2069","promo_type":"1","layout_type":"1","short_excerpt":"Developers deemed the neural network \"too dangerous\"","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/96003","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=96003"}],"version-history":[{"count":1,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/96003\/revisions"}],"predecessor-version":[{"id":96005,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/96003\/revisions\/96005"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/96004"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=96003"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=96003"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=96003"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}