AI translated#AI agents #Artificial Intelligence #cybersecurity #OpenAI

Errors Found in OpenAI’s Blockchain Benchmark

OpenZeppelin audited OpenAI's EVMbench, finding errors and data issues.

03.03.2026 ForkLog

The cybersecurity firm OpenZeppelin has audited OpenAI’s new AI benchmark, EVMbench. Experts identified methodological errors and data “contamination.”

https://t.co/yW00RmRBZQ

— OpenZeppelin (@OpenZeppelin) March 2, 2026

The developer of ChatGPT launched EVMbench in mid-February in partnership with the investment fund Paradigm to assess the ability of AI agents to find, fix, and exploit vulnerabilities in smart contracts.

OpenZeppelin specialists welcomed the initiative but decided to evaluate the development using the same standards as protected protocols (including Aave, Lido, and Uniswap).

Key Shortcomings

The main issue is related to the “contamination” of training data. EVMbench is built on a selection of 120 vulnerabilities identified during audits in 2024-2025.

However, the leading models tested have a knowledge cut-off as of August 2025. These models could “recall” information about these vulnerabilities from training data. Even with the internet disabled, this calls into question the purity of the experiment: it is unclear whether AI can truly identify new threats.

OpenZeppelin also pointed out factual errors in the EVMbench dataset. At least four vulnerabilities classified as “high risk” were found to be non-functional. Yet, AI agents received correct scores for supposedly accurate detection of these issues.

“These are not subjective disagreements about severity; these are cases where the described attack simply does not work,” the experts emphasized.

Specialists confirmed that artificial intelligence will play a key role in the future of blockchain security. However, they warned that rushing implementation should not compromise the quality of data and tests.

“The question is not whether AI will change the security of smart contracts—it will. The question is whether the benchmarks and data on which we build these tools will meet the same standards as the contracts they are meant to protect,” concluded OpenZeppelin.

Back in November, Microsoft experts introduced a testing environment for AI agents and identified vulnerabilities inherent in modern digital assistants.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X

Found a mistake? Select it and press CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

Scientists Develop ‘Neuro-Helmet’ to Control Robot Dog

Experts Predict Imminent Breakthrough in China’s Brain-Computer Interfaces

Bluesky Unveils AI App for Custom Social Media Feeds

Study Highlights Risks of Overreliance on AI for Advice

Ripple to Enhance XRP Ledger Security with AI Integration

Neuralink Patient Plays World of Warcraft Using Mind Control

Suno v5.5 Enables Custom AI Models and Voice Track Generation

The agent economy: how ERC-8004 and x402 turn AI into a market participant