
The OpenAI GPT-4.5 release was a bit disappointing, and many pointed to a crazy price point (about 10 to twenty times costlier than Claude 3.7 Sonnet and 15 to 30x costlier than GPT-4O).
However, considering that this is the largest and strongest model not justifying OpenAI, it is price considering its strengths and areas in which it shines.
Better knowledge and equalization
There are few details about the architecture or training body of the model, but we have the approximate estimate that it has been trained with a 10 -time calculation. The model was so large that Opeli needed to spread training in many data centers to finish in a reasonable time.
Larger models have a greater ability to learn knowledge in the world and the nuances of the human language (provided that they have access to high -quality training data). This is visible in some indicators presented by the Openai team. For example, GPT-4.5 has a record rating on personqa, a reference point that assesses hallucinations in AI models.
Practical experiments also show that GPT-4.5 is higher than other general purpose models with facts residue and compliance with user manuals.
Users identified that GPT-4.5 answers seem more natural and contextual than previous models. His ability to follow tone and style guidelines has also improved.
After the release of GPT-4.5 AI scientist and co-founder of OpenAI Andrej Karpathy, who had early access to the model, he said He “expect[ed] To see an improvement in tasks that are not heavy reasoning, and I would say that these are tasks that are more eq (as opposed to IQ) related and bottlenecks by e.g. global knowledge, creativity, creating analogies, general understanding, humor, etc. “
However, the quality assessment of writing is also very subjective. In a study in which the Carpathians operated on various hints, most individuals preferred GPT-4O reactions than GPT-4.5. He wrote on x: “Or high Bystra testers notice a new and unique structure, but the low, overwhelm the survey. Or we just hallucinate. Or these examples are simply not that great. Or it is really very close and it is too small. Or all of the above. “
Better processing of documents
In your experiments, the box he has Integrated GPT-4.5 In its product, AI AI Studio wrote that GPT-4.5 is “particularly strong in cases of use of enterprises in which accuracy and integrity are a critical mission … Our tests show that GPT-4.5 is one of the best models available both in terms of EACT results and its ability to handle many of the most difficult questions AI.”
In its internal assessments, the Box stated that GPT-4.5 is more accurate in tasks regarding the answers to the company’s questions-overwhelming the original GPT-4 by about 4 percentage points in the test set.

Box tests also indicated that GPT-4.5 stood out in mathematical questions embedded in business documents with which older GPT models often faced. For example, it was higher to reply questions about financial documents that required the justification of the data and the calculations.
GPT-4.5 also showed higher performance in extracting information from unstructured data. In the test, which included the separation of fields from tons of of legal documents, GPT-4.5 was 19% more accurate than GPT-4O.
Planning, coding, assessment of results
Considering his higher knowledge in the world, GPT-4.5 may also be a suitable model for creating high-level plans for complex tasks. Broken steps can then be passed on to smaller but more efficient models to develop and make.
According to Constellation research“During preliminary GPT-4.5 tests, it seems to show strong opportunities in the field of agency planning and performance, including multi-stage work flows and complex task automation.”
GPT-4.5 may also be useful in encoding tasks requiring internal and contextual knowledge. Github now provides limited access Copilot’s coding assistant to the model and notes that GPT-4.5 “works effectively with creative hints and provides reliable reactions to unclear knowledge inquiries.”
Considering its deeper world knowledge, GPT-4.5 is also suitable for “LLM-AS-A-JUDGE“Tasks in which a strong model assesses the exit of smaller models. For example, a model equivalent to GPT-4O or O3 can generate one or several answers, a reason to resolve and convey a final response to GPT-4.5 for obtaining and improvement.
Is it price the price?
However, taking into account the huge costs of GPT-4.5, it is very difficult to justify many cases of use. But that doesn’t suggest it can remain so. One of the everlasting trends that we have seen in recent years are the rapid costs of inference, and if this trend concerns GPT-4.5, it is price experimenting with it and finding ways to make use of your power in the company’s applications.
It is also price noting that this latest model can grow to be the basis of future reasoning models. For Carpathians: “It should be remembered that GPT4.5 was trained only with the view, supervised Finning and RLHF [reinforcement learning from human feedback]So this is not yet a reasoning model. That is why this version of the model does not move the possibilities of the model forward in cases where reasoning is critical (mathematics, code, etc.) … Presumably OpenAi will now want to train with learning to strengthen in addition to the GPT-4.5 model to enable it to think and push the model’s possibilities in these domains. “