Complex New study It revealed that the models of the Open Source artificial intelligence devour much more computing resources than their closed source competitors when performing an identical tasks, potentially undermining their cost benefits and transforming the method of assessing AI implementation strategy.
Research conducted by AI Nous ResearchHe stated that models open in the amount of 1.5 to 4 times more tokens-AI-Niż closed models closed. Openai AND Anthropic. In the case of easy questions about knowledge, Luka has dramatically expanded, and some open models use as much as 10 times more tokens.
Measurement of considering performance in reasoning models: missing reference pointhttps://t.co/b1e1rjx6vz
We measured the use of tokens in reasoning models: open output models 1.5-4x more tokens than models enclosed in an identical tasks, but with a huge variance depending on the type of task (to … pic.twitter.com/ly1083won8
– Nous Research (@Nousresearch) August 14, 2025
“Open weight models use 1.5-4 × more tokens than closed ones (up to 10 × to get simple knowledge about knowledge), which sometimes makes them more expensive to ask despite lower testing costs,” wrote scientists in their report published on Wednesday.
The discoveries undermine the dominant assumption in the AI industry that Open Source models offer clear economic advantages in comparison with reserved alternatives. While Open Source models often cost less for the token, the study suggests that this advantage might be “easy to shift if they require more tokens to reason the problem.”
AI scaling hits its limits
Power capitals, the growing costs of the token and inference delay are transforming AI Enterprise. Join our exclusive salon to find how the best teams are:
- Changing energy into a strategic advantage
- Architect of effective inference regarding real capability profits
- Unlocking competitive roi using balanced AI systems
Secure your place to stay ahead: https://bit.ly/4mwgni
The actual cost of artificial intelligence: why “cheaper” models can break your budget
Examined examination 19 different AI models In three categories of tasks: basic questions about knowledge, mathematical problems and logical puzzles. The team measured “token performance” – how many models of computing units they use in relation to the complexity of their solutions – the record that received small systematic research despite its significant consequences of costs.
“The efficiency of the token is a critical measure for several practical reasons,” the scientists noted. “Although hosting of open mass models can be cheaper, this cost advantage can easily be balanced if they require more tokens to reason a given problem.”
Ineffectiveness is particularly clear for large reasoning models (LRM) that use prolonged “Thought chains“To solve complex problems. These models, designed to rethink step by step problems, can devour hundreds of tokens wondering easy questions that ought to require minimal calculations.
In the case of basic questions about knowledge, similar to “What is the capital of Australia?” The study showed that reasoning models publish “hundreds of tokens thinking about simple knowledge questions”, which might be answered in one word.
Which AI models actually provide bang for your zloty
Studies revealed clear differences between models suppliers. Openai models, especially his O4-mini and newly published Open Source GPT-OSS The variants have shown exceptional token performance, especially in the case of mathematical problems. The study showed that the OpenAI models “are distinguished by the extreme efficiency of the token in mathematical problems”, consuming as much as three times fewer tokens than other industrial models.
Among the Open Source, Nvidia’s options Llama-3.3-NEWOTRON-SUPER-49B-V1 He appeared as “the most token model of open weight in all domains”, while newer models of firms similar to the bus showed “extremely high use of tokens” as protruding values.
The performance difference varied significantly depending on the type of task. While open models used about twice as many tokens for mathematical and logical problems, the difference was balloon for easy questions about knowledge in which effective reasoning needs to be unnecessary.

What enterprise leaders should know about the cost of AI computers
Discoveries have immediate consequences for the adoption of AI enterprises in which calculation costs can scale quickly with use. Companies evaluating AI models often focus on the accuracy of references and prices on the token, but may overlook the total calculation requirements for real tasks.
“Better efficiency of the token of closed mass models often compensates for higher prices of API of these models,” said scientists when analyzing the total costs of inference.
The study also revealed that closed models suppliers appear to actively optimize in terms of performance. “Models of the closed weight have been iteratively optimized to use fewer tokens to reduce the costs of application”, while the Open Source models “increased the use of tokens in newer versions, probably reflecting the priority for better reasoning.”

As scientists broke the code on measuring artificial intelligence performance
The research team faced exceptional challenges in the performance of performance in various models architecture. Many closed models do not reveal their strict reasoning processes, as an alternative providing compressed summaries of their internal calculations to forestall the copying of their competitors techniques.
To solve this problem, scientists used tokens completion – a total computing unit calculated for each query – as a representative of the effort of reasoning. They discovered that “the latest models of closed sources would not share their strict traces of reasoning,” and as an alternative “use smaller language models for transcription of the thoughts chain into summaries or compressed representations.”
The study methodology included tests with modified versions of well -known problems to reduce the impact of remembered solutions, similar to changing variables in problems with mathematical competition with American Invitational Mathematics Exams (Aime).

The future of artificial intelligence performance: what’s going to occur next
Scientists suggest that the efficiency of the token should grow to be the fundamental purpose of optimization along with the accuracy to the future development of models. “A more concentrated cot will also allow more efficient use of context and can counteract context degradation during difficult reasoning” ” They wrote.
Openai’s Open Source release GPT-OSS modelswhich shows the most up-to-date performance with a “freely available cot”, might be used as a reference point for optimizing other Open Source models.
Full Set of Research Data and Evaluation Code are Available at GitHubenabling other researchers to verify and extend the arrangements. When the AI industry is racing towards stronger reasoning, this study suggests that real competition may not apply to who can build the smartest artificial intelligence – but who can build the most effective.
After all, in a world where every token counts, the most prodigal models might be valued on the market, regardless of how well they will think.
