OpenAI Unveils Flagship AI Model GPT-5

ForkLog

7 months ago

OpenAI has launched a new flagship AI model, set to underpin the next generation of ChatGPT.

GPT-5 is here.

Rolling out to everyone starting today.https://t.co/rOcZ8J2btI pic.twitter.com/dk6zLTe04s

— OpenAI (@OpenAI) August 7, 2025

GPT-5 is the first “unified” neural network, combining sequential reasoning and quick-response modes in the GPT style. A special router determines which approach to use for a task: providing a quick answer or spending more time thinking to improve the quality of the result.

While GPT-4 allowed the chatbot to answer a wide range of questions, GPT-5 can now perform tasks on behalf of the user, such as creating software applications, navigating calendars, or generating research reports.

The startup’s CEO, Sam Altman, described GPT-5 as “the best model in the world” and a “significant step” towards creating artificial general intelligence that can surpass humans in the most economically valuable work.

GPT-5 is available to users without a paid subscription, albeit with certain limits. These limits are increased for Plus and Pro subscribers.

Through the API, three models are available: GPT-5, GPT-5 mini, and GPT-5 nano.

Prices for GPT-5 input and output tokens. Source: X.

Benchmarks

OpenAI positions GPT-5 as the most advanced in several areas. In some metrics, it surpasses developments from Anthropic, Google DeepMind, and xAI, though it lags behind competitors in certain directions.

Among the new model’s strengths is programming. In the SWE-bench Verified test, it scored 74.9% on the first attempt, outperforming Claude Opus 4.1 (74.5%) and Gemini 2.5 Pro (59.6%).

In one example, GPT-5 created interactive material to explain complex concepts like the Bernoulli effect, generating hundreds of lines of code in a few minutes.

demo time:

GPT-5 can make something interactive to explain complex concepts like the bernoulli effect to you, churning out hundreds of lines of code in a couple of minutes. pic.twitter.com/cIU7O608TT

— Sam Altman (@sama) August 7, 2025

In another instance, the model created a web application for learning French.

In the Humanity’s Last Exam test, which evaluates AI’s performance in mathematics, humanities, and natural sciences, GPT-5 with extended reasoning (GPT-5 Pro) scored 42%. Grok 4 Heavy scored higher at 44.4%.

Elon Musk took the opportunity to troll OpenAI.

Bottom line though:

Grok 4 Heavy was smarter 2 weeks ago than GPT5 is now and G4H is already a lot better.

Let that sink in. https://t.co/BrggsEwnuz

— Elon Musk (@elonmusk) August 7, 2025

“Grok 4 Heavy was smarter two weeks ago than GPT5 is now, and G4H is already a lot better,” wrote the billionaire.

In the GPQA Diamond test, consisting of doctoral-level scientific questions, GPT-5 pro scored 89.4% on the first attempt, surpassing Claude Opus 4.1 (80.9%) and Grok 4 Heavy (88.9%).

OpenAI claims that GPT-5 performs better on health-related questions. In HealthBench Hard Hallucinations, measuring the model’s accuracy on healthcare topics, GPT-5 hallucinates in 1.6% of cases. This is significantly lower than previous models GPT-4o and o3, which hallucinate in 12.9% and 15.8% of cases, respectively.

The company asserts that GPT-5 surpasses other tools in more subjective areas like creative design and writing.

The new model generally hallucinates much less—4.8% of the time. This is significantly lower than o3 and GPT-4o, which fabricate false information in 22% and 20.6% of responses, respectively.

In Tau-bench, which measures AI’s ability to perform simulated online tasks, GPT-5 showed mixed results. In the test section requiring navigation of an airline website, the model scored 63.5%. The o3 scored 64.8%. In the episode involving navigation of retail pages, the score was 81.1%, lower than Claude Opus 4.1’s 82.4%.

OpenAI noted that the new neural network is safer: it less frequently provides false answers and more effectively identifies malicious users.

Updates

Alongside the release of GPT-5, ChatGPT introduced a customization feature, allowing users to adjust the chatbot’s communication style. Users can choose from suggested personality types: cynic, robot, listener, and nerd. These settings automatically influence the phrasing of responses, eliminating the need to manually set the desired tone each time.

Other updates include:

an improved voice mode—more natural and intelligent;
the ability to customize chat colors;
integration with third-party services like Gmail and Google Calendar for more refined responses.

Back in August, OpenAI released reasoning AI models with open-source code. They demonstrate high performance in several benchmarks and are available for download on Hugging Face.