Today, GPT-4.5 was released as a “research preview” and is available to OpenAI Pro subscribers ($200/ month), as well as developers with API Keys. In addition, OpenAI has released a system card for GPT-4.5, but there is no full release yet.
While OpenAI officially calls GPT-4.5 “the most knowledgeable model to date,” it warns that it is not a cutting-edge model and may not perform as well as O1 or O3-mini.
GPT-4.5 is a “huge and expensive model” and the GPU is not enough.
01
Newest, biggest, but not cutting-edge
OpenAI said that GPT-4.5 has been enhanced in terms of writing ability, world knowledge, and personalized optimization. In addition, the user’s experience of interacting with GPT-4.5 will be more natural, and the model will be better at recognizing patterns and making associations, making it more comfortable with tasks such as writing, programming, and solving real-world problems.
“GPT-4.5 is not a cutting-edge model, but it is OpenAI’s largest LLM to date, with more than 10x better computational efficiency than GPT-4.” OpenAI wrote in a pre-leaked announcement document. “It does not introduce seven new cutting-edge capabilities compared to the previous inference version, and it performs lower than O1, O3-mini, and deep research models in most readiness assessments.” Later, OpenAI removed these in the updated official documentation.
OpenAI revealed that GPT-4.5 employs new supervised techniques and combines traditional methods such as supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), which are similar to how GPT-4o is trained. Although GPT-4.5 still has some limitations, OpenAI says that compared to GPT-4o, its hallucinations have been significantly reduced, and even slightly lower than the O1 model.
As it stands, most of the core features of GPT-4.5 are the same as GPT-4o:
Contextual windows with 128,000
Support for the same type of input (text and images)
The cut-off date for training data is still October 2023
Currently, API calls to this large model are very expensive: $75 for 1 million tokens and $150 for 1 million tokens! In comparison, the price of O1 is $15/$60, while GPT-4o is only $2.50/$10.
GPT-4.5 will be available to Pro users first, with plans to roll out to Plus and Team level users next week, and gradually to enterprise and education users. In addition, the model is now available on Microsoft’s Azure AI Foundry platform, along with Stability, Cohere, and other new models from Microsoft.
02
What are the improvements in GPT-4.5?
According to OpenAI’s blog, GPT-4.5 combines a deeper understanding of the world with enhanced collaboration capabilities, allowing it to integrate ideas more naturally and better adapt to human collaboration needs in more warm and intuitive conversations. In addition, it is more nuanced in understanding human intentions, deciphering subtle cues or implicit expectations, and has a higher “emotional intelligence (EQ).” He also excels in aesthetic intuition and creativity, especially in writing and design.
OpenAI showed a graph of the win rate between GPT-4.5 and GPT-4o, showing that the win rate of GPT-4.5 was between 56.8% ~ 63.2% in different categories of queries:
Everyday queries:57.0%
Professional queries:63.2%
Creative intelligence:56.8%
In addition, GPT-4.5 has a hallucination rate of 37.1% on the SimpleQA task, which is a significant improvement over GPT-4o (61.8%) and o3-mini (80.3%), but still slightly worse than o1 (44%). In benchmarks for programming tasks, it performs on par with the O3-mini.
In Aider’s polyglot coding benchmark, according to a report by netizen Paul Gauthier, GPT-4.5 scored 45%, which is lower than DeepSeek V3 (48%), Sonnet 3.7 (60% without thinking mode, 65% with thinking mode), and o3-mini (60.4%), but is significantly ahead of GPT-4o (23.1%).
Interestingly, OpenAI itself doesn’t seem to have much confidence in the prospects of this model:
GPT-4.5 is a very large and computationally intensive model, so it is more expensive than GPT-4o and is not a replacement for GPT-4o. With this in mind, we are evaluating whether to make it available in the API in the long term to balance current capability support with future model building.
Some netizens tested its drawing ability and asked it to “generate an SVG of a pelican riding a bicycle”, and the results were as follows:
In addition, API access is quite slow. It took 112 seconds to generate the SVG response in its entirety, and the animation shows the slow process of the token return.
OpenAI’s research scientist Rapha Gontijo Lopes calls it “the (probably) largest model in the world” — and apparently, the problem with large models is that they are much slower than small ones!
We’ve (probably) trained the world’s largest model! We believe that the large model has its own unique “atmosphere” and can’t wait for everyone to experience it.
03
Andrej Karpathy’s Opinion: No significant improvement
Andrej Karpathy also posted some observations about GPT-4.5. He noted that he has been looking forward to the launch of GPT-4 since its release, as it serves as a qualitative measure of how much improvement can be achieved with large pre-training computations (i.e., training a larger model).
In OpenAI’s version system, every 0.5 increase represents a 10-fold increase in pre-trained compute. Looking back on the past development process:
GPT-1 is barely capable of generating coherent text;
GPT-2 still looks a bit chaotic, like a toy;
GPT-2.5 is skipped straight into GPT-3, and that’s what gets even more interesting;
GPT-3.5 crossed a critical threshold enough to launch as a product and sparked OpenAI’s “ChatGPT moment”;
GPT-4’s boost is just as noticeable, but the overall feel is more subtle.
GPT-4 is like a “rising tide” boost, with about 20% improvement in all areas. However, GPT-4.5 is not a significant improvement, even though it is 10 times more computationally intensive to train than GPT-4, as Andrej Karpathy points out:
[…] Testing GPT-4.5 now, I feel exactly the same as it did two years ago – it’s a real improvement, and it’s great, but it’s hard to pinpoint exactly where the breakthrough has been. It is important to note that GPT-4.5 is only pre-trained, supervised finetuning, and RLHF (human feedback reinforcement learning), but not trained in deep inference. As a result, it has not improved significantly in areas that rely on reasoning skills, such as mathematics and code. It can be speculated that OpenAI may next conduct reinforcement learning training on top of GPT-4.5 to improve its reasoning ability, thereby further expanding its performance in mathematics, programming, and other fields.
This is in line with the opinion of some netizens. Eli Lifland believes that if his initial assessment of GPT-4.5 is true, then he needs to stretch the expected timeline for the development of AI. Compared to 4o, GPT-4.5 is not significantly improved, especially in terms of programming, and is not even as good as Sonnet. However, it costs 4 times more than 15o and 3.7 to 10 times more than Sonnet 25, which baffles him.
Gary Marcus is more direct, arguing that GPT-4.5 has basically no substantial breakthroughs, and that GPT-5 is still just a fantasy.
Scaling up data and computation is not a physical law, and past speculations have been largely true. On the contrary, all the exaggerations about GPT-5 in recent years have not really come to fruition. One might blame the user for the problem, but the truth is, the results didn’t live up to their expectations.
Author:燕珊、Tina Source:GPT-4.5 发布!OpenAI 史上最大最贵也可能是最慢那个,全网都在骂大街啦 The copyright belongs to the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.