
OpenAI Unveils GPT-5.4 with Computer Vision Capabilities
OpenAI launches GPT-5.4 with computer vision and PC control.
OpenAI has launched GPT-5.4 and GPT-5.4 Pro, just two days after the release of version 5.3 Instant.
GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT.
GPT-5.4 is also now available in the API and Codex.
GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model. pic.twitter.com/1hy6xXLAmJ
— OpenAI (@OpenAI) March 5, 2026
The standard version of GPT-5.4 is available in the ChatGPT web interface, via API, and in the Codex tool. The GPT-5.4 Thinking version is open to Plus, Team, and Pro subscribers.
GPT-5.4 Pro is designed for Pro plan users and Enterprise clients and is also accessible through the API.
The base cost is $2.5 per million input tokens and $15 per million output tokens. The Pro version rates are significantly higher: $30 and $180 per million tokens, respectively.
Performance in Work Tasks
GPT-5.4 delivers more stable and higher-quality results in real-world applications. In the GDPval benchmark, which evaluates task performance across 44 professions, the version achieved a score of 83%, indicating it operates at or above the level of industry specialists. By comparison, GPT-5.2 scored 70.9%.

Developers paid particular attention to working with tables, presentations, and documents. In tasks at the level of a junior investment bank analyst, GPT-5.4 scored 87.3% compared to 68.4% for GPT-5.2.
Evaluators preferred presentations from the new model 68% of the time for better aesthetics, variety, and effective use of image generation.

GPT-5.4 has also become OpenAI’s most accurate model in terms of factual work. When tested on prompts with known errors:
- individual statements were false 33% less often;
- complete answers contained errors 18% less often compared to GPT-5.2.
Computer Vision
This version is the first to feature built-in computer vision and PC control capabilities. The model can use a mouse and keyboard based on screenshots and write code for automation through Playwright.
Behavior is customizable for specific scenarios, considering acceptable risk levels.
In the OSWorld-Verified benchmark (desktop management), GPT-5.4 successfully completed 75% of tasks, surpassing the previous version (47.3%) and humans (72.4%). Progress is attributed to improved visual perception:
- in the MMMU-Pro test (understanding and logic), the result was 81.2% compared to 79.5% for GPT-5.2;
- in OmniDocBench (document analysis), the average error rate decreased from 0.140 to 0.109.
Programming
In coding, the model matches the specialized GPT-5.3-Codex but operates faster.
Codex now features a /fast mode, accelerating generation by 1.5 times without quality loss. In internal tests, GPT-5.4 demonstrated high performance in complex frontend development tasks.
An experimental Playwright (Interactive) skill has also been introduced. It allows the model to visually debug web and Electron applications, testing its own code during the writing process.
Tools
GPT-5.4 includes a Tool Search function. Previously, the system had to preload descriptions of all available plugins into the context, adding thousands of extra tokens to each request and increasing costs.
Now, the model receives only a basic list and independently finds and loads the necessary parameters when needed. In tests based on MCP Atlas, this approach reduced token consumption by 47% without losing accuracy.
Web search has also become more efficient: in the BrowseComp benchmark, performance increased by 17%, and the Pro version reached a record 89.3%. GPT-5.4 Thinking more effectively gathers information from multiple sources, better handles complex queries, and provides more structured responses.
Manageability and Context
When handling complex queries, GPT-5.4 Thinking in ChatGPT first presents users with an action plan. This allows for on-the-fly adjustments without restarting generation or making unnecessary clarifications. The feature is already available on the website and in the Android app, and will soon be on iOS.
The model also better maintains context in long dialogues and takes longer to consider complex tasks. This helps preserve coherence and relevance in responses even when dealing with large volumes of information.
Earlier in March, users boycotted ChatGPT following OpenAI’s deal with the Pentagon.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!