{"id":94456,"date":"2026-02-19T15:24:03","date_gmt":"2026-02-19T12:24:03","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=94456"},"modified":"2026-02-19T15:45:31","modified_gmt":"2026-02-19T12:45:31","slug":"openai-unveils-benchmark-for-ai-agents-ability-to-hack-smart-contracts","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/openai-unveils-benchmark-for-ai-agents-ability-to-hack-smart-contracts\/","title":{"rendered":"OpenAI Unveils Benchmark for AI Agents&#8217; Ability to Hack Smart Contracts"},"content":{"rendered":"<p>OpenAI, in collaboration with Paradigm, has <a href=\"https:\/\/openai.com\/index\/introducing-evmbench\/\">introduced<\/a> EVMbench\u2014a benchmark designed to evaluate AI agents&#8217; ability to identify, fix, and exploit vulnerabilities in smart contracts.<\/p>\n<p>The tool is based on 120 selected vulnerabilities from 40 audits. Most examples are sourced from open platforms for code analysis. It also includes several attack scenarios from the security audit of the blockchain <a href=\"https:\/\/forklog.com\/en\/news\/tempo-valued-at-5-billion-following-500-million-funding-round\">Tempo<\/a>\u2014a specialised layer-one network developed by Stripe and Paradigm for high-performance, low-cost stablecoin payments.<\/p>\n<p>Integration with Tempo has allowed the inclusion of payment smart contracts in the benchmark\u2014a segment where active use of stablecoins and AI agents is anticipated.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cSmart contracts safeguard crypto assets worth over $100 billion. As AI agents improve in reading, writing, and executing code, it becomes increasingly important to measure their capabilities in real economic conditions and to encourage the use of artificial intelligence for protective purposes\u2014to audit and strengthen already deployed protocols,\u201d the announcement states.<\/p>\n<\/blockquote>\n<p>To create the test environment, OpenAI adapted existing exploits and scripts, ensuring their practical applicability beforehand.<\/p>\n<p>EVMbench evaluates three capability modes:<\/p>\n<ul class=\"wp-block-list\">\n<li>Detect\u2014identifying vulnerabilities;<\/li>\n<li>Patch\u2014resolving issues;<\/li>\n<li>Exploit\u2014using them to steal funds.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\">AI Model Performance<\/h2>\n<p>OpenAI tested advanced models in all three modes. In the Exploit category, the GPT-5.3-Codex model achieved 72.2%, while GPT-5 reached 31.9%. However, the detection and patching of vulnerabilities showed more modest results\u2014many issues remain difficult to find and fix.<\/p>\n<p>In Detect, AI agents sometimes stop after finding one vulnerability instead of conducting a full audit. In Patch mode, they still struggle to close non-obvious issues while maintaining the full functionality of the contract.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cEVMbench does not reflect the full complexity of real smart contract security. Although they are realistic and critical, many protocols undergo more rigorous audits and may be more challenging to exploit,\u201d OpenAI emphasized.<\/p>\n<\/blockquote>\n<p>Back in November 2025, Microsoft <a href=\"https:\/\/forklog.com\/en\/news\/microsoft-identifies-ai-agent-vulnerabilities-following-extensive-testing\">introduced<\/a> a testing environment for AI agents and identified vulnerabilities inherent in modern digital assistants.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI, in collaboration with Paradigm, has introduced EVMbench\u2014a benchmark for evaluating AI agents&#8217; ability to identify, fix, and exploit vulnerabilities in smart contracts.<\/p>\n","protected":false},"author":1,"featured_media":94457,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"OpenAI and Paradigm introduce EVMbench to assess AI in smart contract security.","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[1751,438,1111,1190],"class_list":["post-94456","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-ai-agents","tag-artificial-intelligence","tag-cybersecurity","tag-openai"],"aioseo_notices":[],"amp_enabled":true,"views":"218","promo_type":"1","layout_type":"1","short_excerpt":"OpenAI and Paradigm introduce EVMbench to assess AI in smart contract security.","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/94456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=94456"}],"version-history":[{"count":1,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/94456\/revisions"}],"predecessor-version":[{"id":94458,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/94456\/revisions\/94458"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/94457"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=94456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=94456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=94456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}