
OpenAI Unveils Benchmark for AI Agents’ Ability to Hack Smart Contracts
OpenAI and Paradigm introduce EVMbench to assess AI in smart contract security.
OpenAI, in collaboration with Paradigm, has introduced EVMbench—a benchmark designed to evaluate AI agents’ ability to identify, fix, and exploit vulnerabilities in smart contracts.
The tool is based on 120 selected vulnerabilities from 40 audits. Most examples are sourced from open platforms for code analysis. It also includes several attack scenarios from the security audit of the blockchain Tempo—a specialised layer-one network developed by Stripe and Paradigm for high-performance, low-cost stablecoin payments.
Integration with Tempo has allowed the inclusion of payment smart contracts in the benchmark—a segment where active use of stablecoins and AI agents is anticipated.
“Smart contracts safeguard crypto assets worth over $100 billion. As AI agents improve in reading, writing, and executing code, it becomes increasingly important to measure their capabilities in real economic conditions and to encourage the use of artificial intelligence for protective purposes—to audit and strengthen already deployed protocols,” the announcement states.
To create the test environment, OpenAI adapted existing exploits and scripts, ensuring their practical applicability beforehand.
EVMbench evaluates three capability modes:
- Detect—identifying vulnerabilities;
- Patch—resolving issues;
- Exploit—using them to steal funds.
AI Model Performance
OpenAI tested advanced models in all three modes. In the Exploit category, the GPT-5.3-Codex model achieved 72.2%, while GPT-5 reached 31.9%. However, the detection and patching of vulnerabilities showed more modest results—many issues remain difficult to find and fix.
In Detect, AI agents sometimes stop after finding one vulnerability instead of conducting a full audit. In Patch mode, they still struggle to close non-obvious issues while maintaining the full functionality of the contract.
“EVMbench does not reflect the full complexity of real smart contract security. Although they are realistic and critical, many protocols undergo more rigorous audits and may be more challenging to exploit,” OpenAI emphasized.
Back in November 2025, Microsoft introduced a testing environment for AI agents and identified vulnerabilities inherent in modern digital assistants.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!