AI translated#Artificial Intelligence #OpenAI

OpenAI Unveils GPT-5.4 with Computer Vision Capabilities

OpenAI launches GPT-5.4 with computer vision and PC control.

06.03.2026 ForkLog

OpenAI has launched GPT-5.4 and GPT-5.4 Pro, just two days after the release of version 5.3 Instant.

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT.

GPT-5.4 is also now available in the API and Codex.

GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model. pic.twitter.com/1hy6xXLAmJ

— OpenAI (@OpenAI) March 5, 2026

The standard version of GPT-5.4 is available in the ChatGPT web interface, via API, and in the Codex tool. The GPT-5.4 Thinking version is open to Plus, Team, and Pro subscribers.

GPT-5.4 Pro is designed for Pro plan users and Enterprise clients and is also accessible through the API.

The base cost is $2.5 per million input tokens and $15 per million output tokens. The Pro version rates are significantly higher: $30 and $180 per million tokens, respectively.

Performance in Work Tasks

GPT-5.4 delivers more stable and higher-quality results in real-world applications. In the GDPval benchmark, which evaluates task performance across 44 professions, the version achieved a score of 83%, indicating it operates at or above the level of industry specialists. By comparison, GPT-5.2 scored 70.9%.

Developers paid particular attention to working with tables, presentations, and documents. In tasks at the level of a junior investment bank analyst, GPT-5.4 scored 87.3% compared to 68.4% for GPT-5.2.

Evaluators preferred presentations from the new model 68% of the time for better aesthetics, variety, and effective use of image generation.

GPT-5.4 has also become OpenAI’s most accurate model in terms of factual work. When tested on prompts with known errors:

individual statements were false 33% less often;
complete answers contained errors 18% less often compared to GPT-5.2.

Computer Vision

This version is the first to feature built-in computer vision and PC control capabilities. The model can use a mouse and keyboard based on screenshots and write code for automation through Playwright.

Behavior is customizable for specific scenarios, considering acceptable risk levels.

In the OSWorld-Verified benchmark (desktop management), GPT-5.4 successfully completed 75% of tasks, surpassing the previous version (47.3%) and humans (72.4%). Progress is attributed to improved visual perception:

in the MMMU-Pro test (understanding and logic), the result was 81.2% compared to 79.5% for GPT-5.2;
in OmniDocBench (document analysis), the average error rate decreased from 0.140 to 0.109.

Programming

In coding, the model matches the specialized GPT-5.3-Codex but operates faster.

Codex now features a /fast mode, accelerating generation by 1.5 times without quality loss. In internal tests, GPT-5.4 demonstrated high performance in complex frontend development tasks.

An experimental Playwright (Interactive) skill has also been introduced. It allows the model to visually debug web and Electron applications, testing its own code during the writing process.

Tools

GPT-5.4 includes a Tool Search function. Previously, the system had to preload descriptions of all available plugins into the context, adding thousands of extra tokens to each request and increasing costs.

Now, the model receives only a basic list and independently finds and loads the necessary parameters when needed. In tests based on MCP Atlas, this approach reduced token consumption by 47% without losing accuracy.

Web search has also become more efficient: in the BrowseComp benchmark, performance increased by 17%, and the Pro version reached a record 89.3%. GPT-5.4 Thinking more effectively gathers information from multiple sources, better handles complex queries, and provides more structured responses.

Manageability and Context

When handling complex queries, GPT-5.4 Thinking in ChatGPT first presents users with an action plan. This allows for on-the-fly adjustments without restarting generation or making unnecessary clarifications. The feature is already available on the website and in the Android app, and will soon be on iOS.

The model also better maintains context in long dialogues and takes longer to consider complex tasks. This helps preserve coherence and relevance in responses even when dealing with large volumes of information.

Earlier in March, users boycotted ChatGPT following OpenAI’s deal with the Pentagon.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X

Found a mistake? Select it and press CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

Scientists Develop ‘Neuro-Helmet’ to Control Robot Dog

Experts Predict Imminent Breakthrough in China’s Brain-Computer Interfaces

Bluesky Unveils AI App for Custom Social Media Feeds

Study Highlights Risks of Overreliance on AI for Advice

Ripple to Enhance XRP Ledger Security with AI Integration

Neuralink Patient Plays World of Warcraft Using Mind Control

Suno v5.5 Enables Custom AI Models and Voice Track Generation

The agent economy: how ERC-8004 and x402 turn AI into a market participant