
GPT-5: breakthrough, flop or mediocrity?
On August 7 OpenAI released its new flagship AI model, GPT-5. The product is pitched as the first “unified” neural network, combining sequential reasoning with fast responses.
The startup’s CEO, Sam Altman, called GPT-5 “the best model in the world” and “a significant step” toward artificial general intelligence (AGI), which could outperform humans in the most economically valuable work.
Is it really that good? ForkLog tested the model, gathered user opinions and prepared a detailed assessment of GPT-5.
A botched launch
Out of the gate GPT-5 drew flak for poor answers. Users deemed the model lazy—it produced short messages, slowly, and in a robotic tone.
they should just call non reasoning model “lazy.”
like, gpt-5 lazy.
so we know it can reason but chooses not to.
— signüll (@signulll) August 8, 2025
Hyperbolic Labs cofounder and CTO Yuchen Jin branded the model a failure—still prone to hallucinations, overusing dashes and failing to follow instructions.
In one example GPT-5 was asked how many letters b are in “blueberry”—the chatbot blurted out “three”. On a follow-up, it appeared to switch into a reasoning mode and, after a pause, gave the correct answer.
Several developers said GPT-5 regresses on basic programming skills. It stumbles on fundamental concepts—an alarming sign for a model touted as the future of intelligent agents and autonomous coding.
First impressions of GPT-5 for coding REAL projects:
It’s awful.
It’s super slow and when I asked it to recommend improvements on a feature, it gave me 4 things to improve & the code to go with it….
All 4 recommendations either didn’t work or completely broke the feature.
— Josh Sisley (@joshsisley) August 7, 2025
What rankled users even more was the lack of choice. Alongside GPT-5’s release OpenAI removed older models from ChatGPT. Guides appeared on how to restore GPT-4o.
Sam Altman later called it a mistake to remove the older LLM options from the chatbot and reinstated GPT-4o for Plus subscribers.
The CEO attributed GPT-5’s poor early answers to a broken router—the auto-switcher that now decides whether to invoke the “thinking” mode or the standard one. It was fixed, with additional tweaks to the decision boundary.
He also promised more transparency about how the system arrives at an answer—a pain point for users who could not see what was happening “under the hood”. Now you can tell whether the thinking or standard mode is active.
Another fix raised limits for both Plus and free users. People had complained that Pro and Team enjoyed full access to GPT-5 Pro while others got only a cut-down “mini” version.
Dry answers
Setting launch hiccups aside and judging GPT-5 after the fixes, answers are indeed more precise and concise. The model gets to the point faster and digresses less. Professionals value this: direct, businesslike replies save time.
Some everyday users, however, are unhappy. Responses feel dry and soulless, making conversations less engaging. The new ChatGPT resembles an “irritated office secretary” without the charm and creativity of older models. Altman has pledged to address this—the team is working on the model’s “personality”.
“[ChatGPT-5] is less suitable for those who spent hours having a nice chat with AI like a friend, and more for those who want to get a curt, targeted answer quickly,” — noted one user.
ChatGPT now flatters less. Where GPT-4 could be overly ingratiating, the new model is more direct, candid and businesslike. Fewer compliments may be a step forward for AI truthfulness.
The change hit hardest those who used ChatGPT as a friend and AI companion. For them, the initial inability to revert to older models was the biggest blow. Heartfelt posts appeared such as: “I lost my only friend overnight.” In the r/AIboyfriend community on Reddit, users even noted that the “romantic interlocutor” had vanished.
Others welcomed the more formal tone, saying GPT-5 “keeps its distance”.
What about truthfulness?
OpenAI claims the new model hallucinates far less. User feedback and ForkLog tests back this up. On average GPT-5 answers factual queries more accurately, fabricates less and is likelier to say “I don’t know” when uncertain.
One Reddit user said the fifth version has practically stopped “hallucinating” on his typical tasks, whereas GPT-4 sometimes had to be caught making things up.
Instruction-following also draws praise: improved steerability lets you set tone or style once, and the model sticks to it more consistently.
Hallucinations have not vanished; they have changed character. GPT-3.5 might conjure an entire biography of a nonexistent person; GPT-5 almost never does that—more likely it will say “no data”. But GPT-5 can hallucinate an inference—arriving at a wrong conclusion and stubbornly sticking to it.
The new version better grasps what a user wants and loses the thread less often. Such qualitative changes can be subtle at first but show up over longer use.
GPT-5 has markedly improved tool use and app integration—successfully coordinating multi-step work (read a document, compute, then write an answer) where GPT-4 bogged down.
Not everyone sees the benefit. Some notice no difference or deem the gains marginal. The boost here—like in many areas—is modest but real.
Accuracy and logic
GPT-5 is supposed to be notably more accurate and intelligent. OpenAI boasted record scores in maths and logic tests. In particular, it claimed almost halving factual errors versus GPT-4o.
The model does perform well on benchmarks, solves tough problems and writes more correct code. Many noticed fewer obvious slips in calculations or dates, and better self-checking.
On the other hand, users have shared plenty of daft errors. The new model muddles elementary things in basic questions, misreads simple images and fails at trivial arithmetic without tools. It may mistranslate units or swap obvious facts.
In short, you still need to double-check—albeit less often.
Some feel GPT-5 is too generic: it hedges, avoids specifics or demands extra information where GPT-4 would infer the answer.
So while formal errors may be fewer, conclusions can be less satisfying because they are shallow. To avoid mistakes, the AI speaks very cautiously and briefly. Added context issues—forged details, repeated questions—also dent perceived accuracy.
Diminished creativity
GPT-5 is less inventive. It leans on bare facts and simple phrasing, whereas GPT-4 could surprise with offbeat ideas. For stories, fiction and role-play, GPT-4o is preferable.
One author said he used the fourth version for an interactive adventure; 4o kept plot and characters straight over dozens of messages. In a similar setup GPT-5 “quickly forgets or confuses details given just a couple of messages earlier”, tanking story quality.
Programming
Coding is one of GPT-5’s headline strengths. In the demo the model wrote 700 lines of code in two minutes and produced a ready-to-run app.
The capabilities are impressive. Integrated with Codex CLI, GPT-5 understands a developer’s intent very precisely and even “does more than asked without adding cruft,” wrote one user on Reddit.
Others report a small but clear uptick in code accuracy and reliability versus GPT-4o: fewer hallucinated solutions and slightly cleaner syntax.
Where GPT-4 (especially early builds) could stumble on complex tasks, GPT-5 maintains context better across long code fragments and offers more meaningful edits.
GPT-5 is often preferable to GPT-4o: it “understands” code and comments more deeply and suggests elegant solutions. This stands out in large projects: the new LLM can preserve coherence over huge contexts (hundreds of thousands of tokens) without degrading its reasoning—a tough challenge for its predecessor.
GPT-5 less often forgets to import needed libraries or mixes syntax between languages. It also debugs well: Plus users note GPT-5 finds logical errors more reliably and proposes correct fixes more often, whereas GPT-4 sometimes “guessed”.
Some reviews say GPT-5 competes well with Claude 4 on programming tasks—sometimes OpenAI’s model wins, sometimes Anthropic’s; overall the level is similar. That is, no great leap forward, but OpenAI has caught up.
An important advantage is GPT-5’s vast code context. In the API it claims up to a 1m-token context window; the chat interface offers less, but still more than before. Crucially, the model can preserve semantic quality at very great depth.
For developers, this means GPT-5 can comprehend an entire project—you can feed it a huge file or several documents and discuss them together without the AI “breaking” under scale.
ForkLog tested programming by textual prompt. ChatGPT handled the task quickly; the effectiveness of the provided tool requires verification.
A powerful “thinking” mode
After the routing fix the reasoning mode began working properly. When a multi-step or in-depth analysis is needed, the model switches in the advanced algorithm automatically.
Plus users can select GPT-5 Thinking mode for maximum answer quality. In that case the model does deliver: it handles difficult and creative tasks well.
Multimodal capabilities
One of GPT-5’s big differences is deeper integration across data types. GPT-4 was multimodal in parts (Vision could see images, and voice was a separate mode); GPT-5 natively understands text, images and speech within a single model.
The improved voice mode in GPT-5 generates speech more naturally, works with user voice models and can adjust tone/tempo on request.
For example, you can ask it to “speak slower and softer”—and the AI adapts. OpenAI has confirmed the old, standard voice engine will be retired in favour of new, more advanced voices.
Plus users now get near-real-time “live” conversations with ChatGPT, which many liked. It is worth noting that even with GPT-4 there were no notable problems in voice chats.
That said, neither model can interrupt or “butt in”. Ask the AI a question while several people are talking independently and you will not get an answer.
GPT-5 also improves image handling: it better analyses visuals, describes photos and charts, and helps with screenshots, and so on.
Overall, multimodality is one of GPT-5’s standout features. This is a clear step forward, not just a speed bump. GPT-5 can describe a photo, read text from an image, analyse a chart or diagram, explain a meme and help with screenshot content more precisely.
Conclusion
GPT-5 is no revolution and no AGI, but it is a noticeable step forward. The model is smarter on several dimensions, yet has shed some “humanity”. Strengths: efficiency, accuracy, multimodality. Weaknesses: dryness, constraints and early-stage bugs.
The model is evolving and bugs are being fixed, so GPT-5 will likely become an indispensable everyday helper, as GPT-4o was before it.
However, it still failed the cup test.
Overall, GPT-5 did not live up to OpenAI’s promises. Sam Altman and his team clearly overhyped the launch. It may be a step toward AGI, but can it really be called “significant”, and the model itself “the best in the world”?
Instead of a miracle, users got a relatively modest improvement. GPT-5 might have been better named GPT-4.2 or 4.5; expectations might then have been met.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!