AI translated#AI agents #Artificial Intelligence #cybersecurity #OpenAI

OpenAI Unveils Benchmark for AI Agents’ Ability to Hack Smart Contracts

OpenAI and Paradigm introduce EVMbench to assess AI in smart contract security.

19.02.2026 ForkLog

OpenAI, in collaboration with Paradigm, has introduced EVMbench—a benchmark designed to evaluate AI agents’ ability to identify, fix, and exploit vulnerabilities in smart contracts.

The tool is based on 120 selected vulnerabilities from 40 audits. Most examples are sourced from open platforms for code analysis. It also includes several attack scenarios from the security audit of the blockchain Tempo—a specialised layer-one network developed by Stripe and Paradigm for high-performance, low-cost stablecoin payments.

Integration with Tempo has allowed the inclusion of payment smart contracts in the benchmark—a segment where active use of stablecoins and AI agents is anticipated.

“Smart contracts safeguard crypto assets worth over $100 billion. As AI agents improve in reading, writing, and executing code, it becomes increasingly important to measure their capabilities in real economic conditions and to encourage the use of artificial intelligence for protective purposes—to audit and strengthen already deployed protocols,” the announcement states.

To create the test environment, OpenAI adapted existing exploits and scripts, ensuring their practical applicability beforehand.

EVMbench evaluates three capability modes:

Detect—identifying vulnerabilities;
Patch—resolving issues;
Exploit—using them to steal funds.

AI Model Performance

OpenAI tested advanced models in all three modes. In the Exploit category, the GPT-5.3-Codex model achieved 72.2%, while GPT-5 reached 31.9%. However, the detection and patching of vulnerabilities showed more modest results—many issues remain difficult to find and fix.

In Detect, AI agents sometimes stop after finding one vulnerability instead of conducting a full audit. In Patch mode, they still struggle to close non-obvious issues while maintaining the full functionality of the contract.

“EVMbench does not reflect the full complexity of real smart contract security. Although they are realistic and critical, many protocols undergo more rigorous audits and may be more challenging to exploit,” OpenAI emphasized.

Back in November 2025, Microsoft introduced a testing environment for AI agents and identified vulnerabilities inherent in modern digital assistants.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X

Found a mistake? Select it and press CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

Opinion: Fusion could solve AI’s looming energy shortfall

The big, bad AI that isn’t

Anthropic Unveils Claude Sonnet 4.6 with a Million-Token Context Window

Apple’s Ambitious Plans for AI-Driven Wearables Unveiled

Vibe Coding via Claude Opus Leads to Moonwell DeFi Project Breach

Unitree Aims to Sell 20,000 Robots Following Spectacular Gala Performance

Who Governs the Bots? AI Agents and the Future of Web3 Power in 2026

OpenClaw Founder Faces Backlash from Crypto Community