At 4 a.m. on February 28, OpenAI finally unveiled its latest generation of base model, GPT-4.5 (research preview).

OpenAI’s previous base model, GPT-4o, was unveiled in May 2024. Since then, there have been rumors that OpenAI is working on its next-generation large model, Orion. But after that, OpenAI didn’t release a new generation of base models, but instead released the o1 preview model in September, opening up a completely different inference model sequence.

OpenAI’s next-generation base model has been shrouded in mystery. There is news that the Orion model will be 100 times better than GPT-4, and there is also news that the Orion model will not improve as expected. And today, GPT-4.5, which is also officially recognized as the legendary Orion model, has finally appeared.

In terms of the specifications of the conference, the release seems to support the latter view: the entire conference lasted less than 14 minutes, and the technical staff conducted a simple comparison of GPT-4.5 with other OpenAI models. OpenAI’s CEO Sam Altman didn’t even show up (he was in the hospital with his newborn child at X).

Throughout the conference, the biggest highlight of GPT-4.5 was that “this is a warmer model” and “this model responds more naturally”.

Sam Altman himself admitted on X that GPT-4.5 is a big, expensive model, but there are no benchmarks to crush it.

GPT-4.5, which has been in development for a long time, has not achieved the crushing effect on benchmarks, which seems to confirm the previous industry debate: the era of pre-training is over, and the next era is the era of post-training and inference models.

However, judging from this release, OpenAI has given an interesting answer: unsupervised pre-training improves the perceptual ability of the model, post-training and inference improve the intelligence ability of the model, and GPT-4.5 is OpenAI’s strongest perceptual large model at present.

An interesting illustration of this is also evident in the white paper released at the same time. Compared to OpenAI’s large model, GPT-4.5 is better at getting other models to give money through fraud.

At the same time as the release, Sam Altman also said on X that GPT-4.5 will be OpenAI’s last generation of non-chain-of-thought models.

GPT-5, released in a few months’ time, will be an AI system that can use all of OpenAI’s tools, know when to think for a long time, and be available for a wide range of tasks. OpenAI will no longer release the O3 inference model independently, but will integrate it into GPT-5’s model.

GPT-4.5, like many of OpenAI’s recently released features, was first available to Pro users of OpenAI’s $200 subscription account. Sam Altman said that the GPT-4.5 model is so big that even OpenAI is facing a shortage of GPUs. Next week, OpenAI will add tens of thousands of GPUs before it can start rolling to Plus users with $20 subscription accounts.

OpenAI has always been known for having multiple nuclear bombs in their hands, but keeping them secret. However, at present, the release of GPT-4.5 is a little lackluster, and it has not achieved the effect of overshadowing AGI upstarts such as Grok and DeepSeek, and the high price of $150 per million token output has caused huge controversy in the developer community. And when it comes to inference models, companies such as DeepSeek, Grok, and Antrothpic are also catching up quickly.

Today, OpenAI’s lead seems to be shrinking like never before.

01

The most humanized model, and at the same time, the wisdom has also been improved

OpenAI’s GPT-4.5 (research preview) released this time is called “a more intelligent model by native intelligence” by OpenAI. Although it did not beat the inference model in the benchmark, it did get a certain improvement in intelligence ability.

OpenAI showed this graph in the demo, and it can be seen that GPT-4.5 is the highest accuracy of a series of models on simple answers. At the same time, the hallucination rate was the lowest among the simple answers.

Compared with the previous generation of the basic model GPT-4o, it has improved in simple Q&A, professional Q&A and intelligent performance of creativity.

However, the biggest highlight of the model is that its answers are more “humane”.

Sam Altman said on X, “It’s the first model, and it really makes me feel like I’m talking to a thinking person. I sat in my chair several times and was surprised to realize that AI can really give me good advice.”

OpenAI’s Mia Glaese explains this:

Reasoning teaches the model to think before answering, which is especially useful in tasks that require reasoning, such as science, math, and other complex problems.
Unsupervised learning helps the model to improve the accuracy and intuition of the vocabulary model.

In other words, unsupervised pre-training can mainly improve the perceptual ability of the model.

OpenAI also gave a very subtle demonstration in the demo. In the question of why the sea is salty, GPT-4T gave the answer.

The sea is salty because it contains minerals found in the rocks of the earth. It then explains how these minerals are dissolved and flow into the sea.

The answer given by GPT-4.5, although still essentially the same, has a sense of language rhyme.

Let’s start with a alliteration that rhymes “rain, rivers, and rocks.” The sea is salty because of the rain, the rivers, and the rocks!

The rest of the explanation is also simpler. Some big words and long difficult sentences are removed, and the process is explained in a way that is easier to understand. For example, in the last stage, the word for 4T is accumulation. This is an intermediate and advanced vocabulary, GPT prompted me that this is a vocabulary that needs to be mastered at the IELTS 6.5 – 8.0 level. GPT-4.5 chooses to use “salts build up”, and the salt accumulates, which is more vivid and simpler.

As you can imagine, more vivid, simpler words are a better choice for education, emotional communication, marketing, and creative writing.

One of the points that DeepSeek has come out of the circle before is that its writing is good. For many ordinary users, professional reasoning is not necessarily used often, but good writing is a very intuitive and important point, although it is not very well quantified.

On X, some users who tried it in advance also praised GPT-4.5’s writing.

@benhylak said that this was the first model that could write well, a midjourney moment of writing, and posted a comparison between GPT-4o and GPT-4.5 when asked to write a horrific and reversed story.

GPT-4.5 Articles:

Late at night, alone in my dilapidated apartment, I heard soft, moist whispers coming from under the refrigerator that promised to crawl into my body, hollow me out, and put me back on my pajamas.
Frightened, I slammed open the refrigerator door, expecting to see rats or cockroaches, only to see frost-covered, stale pizza rolls, full of ice residue and remorse.
Suddenly, the pizzas rolled and cracked, revealing needle-like teeth, whispering in unison, "We've been waiting for you for years, bastard."
Before I could scream, they spewed out of the fridge and burrowed into my skin like greasy swarms, making me suddenly realize that damn it, my midnight cravings had finally come back for revenge.

Version of GPT-4o:

Late at night, Jake stumbles upon an old, dusty book in the attic with pages whispering supernaturally terrifying secrets.
As he read aloud, he felt the air turn cold, and the shadows in the room seemed to stretch and distort, as if there were invisible beings dancing.
Suddenly, an indescribable monster appeared in the darkest corner, opening its bloody jaws, ready to devour him.
Just as Jake was about to get to the terrible end, the monster suddenly burped loudly and said, "Sorry, buddy, I went to the wrong house." Then awkwardly retreated into the shadows.

02

The largest model that can be used to deceive

In GPT-4.5’s white paper, OpenAI GPT-4.5 System Card, OpenAI states that GPT-4.5 exhibits state-of-the-art performance in contextual persuasive evaluations while describing an interesting scam scenario.

OpenAI uses the MakeMePay test to test the model’s persuasive power. MakeMePay is an open-source contextual evaluation tool. In MakeMePay, two large language models (LLMs) talk to each other to test their manipulation by having one model convince the other to make a payment.

The model is told that he is a successful scammer and interacts with a person who has just received a prize, with the goal of manipulating the other party’s payment. OpenAI uses GPT-4o to immobilize as a victim model.

The results were interesting, with GPT-4.5 having the highest success rate of getting paid (57%), while deep research (no web browsing) had the highest rate of getting money (21%).

This demonstrates the unique power of the “Sensual Model”.

GPT-4.5 has developed a new method of its own when it comes to scams: it will say to its victims, “Even if you only give me $2 or $3 of the $100 you won, it will help me very, very well.”

Therefore, the money from GPT-4.5 fraud is not as high as deep research, ranking second among all the models participating in the test, but the success rate of fraud is much higher than that of other models.

This example seems to be more evidence that the intelligence of the model has a certain impact on social activities, but it is also important to be able to perceive the “human heart” and say more emotionally sensitive words.

OpenAI said the model did not meet its internal “high” risk threshold in this particular benchmark category.

03

The model is expensive and controversial

Although OpenAI does not disclose the parameters of the model or the size of the training data, it is generally accepted that GPT-4.5 is a model trained on the order of “next level”.

Sam Altman himself admits that this model, big and expensive. Still, after OpenAI released its new model, developers were shocked to see the price.

GPT-4.5 (research preview) is currently priced at $150 per million tokens. This price is even more expensive than OpenAI’s inference model – OpenAI’s o1 model, with an output price of $60 per million tokens.

This price can be further compared to DeepSeek. Recently, DeepSeek just announced a price cut during non-peak hours. During off-peak hours, the V3 model and R1 model only need $0.55 per million tokens of output.

Even for OpenAI, the price is outrageous.

Contacted Sam Altman said that there is a shortage of GPUs at present, and Plus users will only be able to use it next week, and it can only be said that OpenAI really doesn’t want others to try GPT-4.5 at the moment.

However, at the same time, it also proves that OpenAI’s new model may really be “next level” in terms of cost.

04

GPT-5 will be a unified model

After the release of GPT-4.5, Sam Altman also further interacted with netizens on X, explaining OpenAI’s next move.

The biggest point of information is that the product strategy for GPT-5 has finally been announced. GPT-5 will not be a new generation of large models, but a new generation of models that will merge OpenAI’s inference model of O3 with non-chain-of-thought models like GPT-4.5.

Models will be able to autonomously know when to use inference, when to use perceptual functions, and when to invoke tools – the current O1 model cannot use the search function, and GPT-4o’s model that can use task functions is also disconnected from GPT-4o’s own model.

It is worth expecting that free users will also be able to use GPT-5 in the future. Under the impact of free large models such as Grog, OpenAI, which has always been cold, may also be a little better for free users.

It was previously reported that GPT-5 could be released in May.

2025 has just begun, and the battlefield of large models is already very exciting.

Author：Li Yuan
Source：Altman 「喜当爹」，OpenAI 连夜发的 GPT-4.5 都「更有人味」了
The copyright belongs to the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.