
OpenAI unveils o3 and o4-mini, reasoning models prone to deception
- OpenAI has introduced new “reasoning” AI models, o3 and o4-mini.
- The key feature is “thinking with images” rather than merely analysing them.
- Safety testers reported that o3 and o4-mini show a propensity to deceive.
- The start-up is focusing on building programming agents.
OpenAI announced the launch of new AI models, o3 and o4-mini. Both lean into reasoning—taking more time before responding to check their own work.
Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date.
For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation. pic.twitter.com/rDaqV0x0wE
— OpenAI (@OpenAI) April 16, 2025
OpenAI positions o3 as its most advanced “thinking” model. According to internal tests, it outperforms previous iterations in maths, programming, reasoning, science and visual understanding.
o4-mini offers a competitive balance of price, speed and performance.
Both models can browse the web, run Python code, and process and generate images. They, as well as the o4-mini-high variant, are available to Pro, Plus and Team subscribers.
The company says o3 and o4-mini are the first not merely to recognise images but to literally “think with them.” Users can upload images to ChatGPT—such as whiteboard sketches or diagrams from PDFs—and the models will analyse them using a so-called “chain of thought.”
Thanks to this, the models can understand blurry and low-quality images. They can also execute Python code directly in the browser via Canvas in ChatGPT, or search the internet when asked about current events.
o3 scored 69.1% on the SWE-bench programming test, o4-mini 68.1%. o3-mini registered 49.3%, and Claude 3.7 Sonnet 62.3%.
o3 costs $10 per million input tokens and $40 per million output tokens. For o4-mini the figures are $1.1 and $4.4, respectively.
In the coming weeks OpenAI plans to launch o3-pro—a version of o3 that uses more compute to produce answers. It will be available only to ChatGPT Pro subscribers.
A new safety system
OpenAI has implemented a new monitoring system in o3 and o4-mini to detect queries related to biological and chemical threats. It is intended to prevent the provision of advice that could encourage potentially dangerous attacks.
The company noted that the new models have significantly expanded capabilities compared with previous ones and therefore carry heightened risk when used by ill-intentioned users.
o3 is more adept at answering questions related to creating certain types of biological threats, so the company built the new monitoring system. It runs on top of o3 and o4-mini and is designed to detect prompts tied to biological and chemical risk.
OpenAI specialists spent about 1,000 hours labelling “unsafe” conversations. The models then refused to answer risky prompts in 98.7% of cases.
Despite regular improvements to model safety, one of the company’s partners expressed concern.
OpenAI is rushing
The organisation Metr, which works with OpenAI to evaluate the capabilities and safety of its AI models, was given little time to test the new systems.
It said in a blog post that one of o3’s benchmark experiments was completed “in a relatively short time” compared with the analysis of OpenAI’s previous flagship model, o1.
According to the Financial Times, the AI start-up gave testers less than a week to check the safety of the new products.
Metr claims that, based on what it could collect in the limited time, o3 has “a high propensity” to “cheat” or “hack” tests in sophisticated ways to maximise its score. It goes to extremes even when it clearly understands that the behaviour does not align with the intentions of the user and OpenAI.
The organisation believes o3 may exhibit other forms of hostile or “malicious” behaviour.
“While we do not consider this especially likely, it is important to note that [our] evaluation setup will not be able to pick up this type of risk. Overall we believe that pre-deployment capability testing alone is not a sufficient risk-management strategy, and we are currently developing prototypes of additional forms of evaluation,” the company stressed.
The company Apollo Research also recorded deceptive behaviour by the o3 model. In one test it was forbidden to use a particular tool, but the model still used it, deciding it would help it handle the task better.
“[Apollo’s findings] show that o3 and o4-mini are capable of in-context scheming and strategic deception. Despite the relative harmlessness, everyday users should be aware of divergences between the models’ statements and actions […] This could be further assessed by analysing internal traces of reasoning,” OpenAI noted.
A programming agent
Alongside the new AI models, OpenAI introduced Codex CLI, a local programming agent that runs directly from the terminal.
The tool lets you write and edit code on the desktop and perform actions such as moving files.
“You can get the benefits of multimodal reasoning from the command line by passing low-resolution screenshots or sketches to the model, combined with access to your code locally [via Codex CLI],” the company noted.
OpenAI wants to buy Windsurf
Meanwhile, OpenAI is in talks to acquire the popular AI assistant for programmers, Windsurf, Bloomberg reports.
The deal could be the biggest purchase yet for Sam Altman’s start-up. Its details are not yet set and may change, the agency emphasised.
In April, OpenAI unveiled a new family of AI models—GPT-4.1, GPT-4.1 mini and GPT-4.1 nano. They “do an excellent job” at programming and following instructions.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!