Small models have a moment. On the heels of the new AI vision model Small enough to suit on the smartwatch from the myth of Spinoff Liquid AI and the model is sufficiently small to run a Google smartphone, Nvidia joins the party today With New Little Language model (SLM) own, Nemotron-Nano-9B-V2which reached the highest performance in its class on chosen comparative tests and is associated with the possibility of enabling and off “reasoning”, i.e. self -esteem before answering the answer.
While the parameters of 9 billion are larger than some of the multi -million parameters of small Venturebeat models recentlyNvidia notes that this is a significant reduction in relation to the original size of 12 billion parameters and was designed to match Single nvidia a10 gpu.
As Oleksii Kuchiaev, director of NVIDIA Model AI after training, said on x In response to the query, I put him: “12b was cut to 9b to get a specially adapted to the A10, which is a popular GPU selection to implement. It is also a hybrid model that allows it to process a larger party size and be up to 6x faster than models of transformers of similar size. “
In the case of the context, many leading LLM is in the range of parameters of over 70 billion (withdrawal parameters relate to internal settings governing the behavior of the model, and more generally meaning larger and more talented, and at the same time a more calculating intensive model).
AI scaling hits its limits
Power capitals, the growing costs of the token and inference delay are transforming AI Enterprise. Join our exclusive salon to find how the best teams are:
- Changing energy into a strategic advantage
- Architect of effective inference regarding real capability profits
- Unlocking competitive roi using balanced AI systems
Secure your home to stay ahead: https://bit.ly/4mwgni
The model supports many languages, including English, German, Spanish, French, Italian, Japanese and in prolonged descriptions, Korean, Portuguese, Russian and Chinese. Is suitable for each The instructions below and code generation.
Nemotron-Nano-9B-V2 And this Data sets before training Now available on hugging the face and through the company model catalog.
Fall of transformer architecture and mamby
It is based on Nemotron-HA set of hybrid models of Mamba transformers, which are the basis for the company’s latest offers.
While the hottest LLM are pure “transformers” that are completely based on layers of attention, they’ll turn out to be expensive in memory and calculate when sequence lengths grow.
Instead, Nemotron-H and others models Mamba architecture developed by scientists at Carnegie Mellon University and Princeton Take the selective models of the state space (or SSM), which may support very long information sequences WIE, maintaining a state.
These layers scale linearly with the length of the sequence and can process the contexts much longer than standard self -improvement without the same memory and calculating general costs.
HYBRID MABA-TRANSFORATOR reduces these costs, replacing most of the attention with layers of linear condition space, reaching as much as 2-3 x higher bandwidth on long contexts With comparable accuracy.
Other AI laboratories except Nvidia, akin to AI2, also released models based on Mamba architecture.
Turn on/reasoning using the language
Nemotron-Nano-9B-V2 is set as a unified chat only text and reasoning trained from scratch.
. By default, the system generated tracking of reasoning before giving the final answer, although users can switch this behavior Through easy control tokens, akin to /thought or /no_think.
I alsoIt will produce the management of the “thinking” of the “thinking” roundWhich allows programmers to limit the variety of tokens dedicated to internal reasoning before the model ends the answer.
This mechanism is aimed at balancing accuracy with delay, Especially in applications akin to customer support or autonomous agents.
The benchmarks tell a promising story
The results of the assessment emphasize the accuracy of competitive in relation to other open models on a small scale. Tested in the “reasoning” mode using the Nemo-SKILLS Apartment, Nemotron-Nano-9B-V2 reaches 72.1 percent at Aime25IN 97.8 percent at Math500, 64.0 percent in GPQAAND 71.1 percent at Livecodebench.
The results are also reported in the field of observing the instructions and long -rate references: 90.3 percent in the case of IFEVAL, 78.9 percent in the case of the ruler’s test 128KAnd smaller but measurable profits from BFCL V3 and Hle Benchmark.
In the entire Nano-9B-V2 board, it shows a higher accuracy than QWEN3-8B, A typical comparison point.

NVIDIA illustrates these results with curves-building accuracy that show how the addition to the justification for the token increases. The company suggests that careful budget control may also help programmers optimize each quality and delays in cases of production.
Trained on synthetic data sets
Both the Nano model and the Nemotron-H family are based on a mixture of chosen, source and synthetic training data.
The corporation includes general text, code, mathematics, science, legal and financial documents, in addition to data sets with questions in the kind of equalization.
NVIDIA confirms the use of synthetic reasoning generated by other large models to strengthen performance on complex comparative tests.
Licensing and industrial use
The Nano-9B-V2 model is released under NVIDIA OPEN MODEL LIVIONAT License agreementLast updated in June 2025.
The license has been designed to be allowed and friendly for enterprises. Nvidia clearly states that the models are outside the boxand this Developers can freely create and distribute derivative models.
Importantly, NVIDIA does not claim that there is no ownership of any results generated by the model, leaving the responsibility and rights to the developer or organization.
In the case of an enterprise developer, which means that the model could be immediately introduced into production without negotiating a separate industrial license or paying fees related to the thresholds, levels of revenues or numbers of user. There are no clauses requiring a paid license when the company reaches a specific scale, versus some multi -level open licenses used by other suppliers.
To say, the contract covers several conditions that enterprises must observe:
- Handrail: Users may not bypass or disable built -in safety mechanisms (known as “handrails”) without implementing comparable deputies adapted to their implementation.
- Redistribution: Any redistribution of the model or derivative instruments must contain the text and project of the NVIDIA Open Model license (“Licensed by NVIDIA Corporation based on the NVIDIA Open Model model license”).
- Compatibility: Users must comply with industrial regulations and restrictions (e.g. export laws in the US).
- AI Trustworthy conditions: The use have to be in line with NVIDIA trustworthy AI guidelines, which include responsible distribution and ethical considerations.
- Court dispute clause: If the user initiates copyrights or patent considerable in relation to a different entity that accuse the model by the model, the license mechanically ends.
These conditions are focused on legal and responsible than on a industrial scale. Enterprises do not have to look for an additional permit or pay with the NVIDIA royalties simply for building products, earning them or scaling their user base. Instead, they have to make sure that implementation practices respect security, attribution and compliance obligations.
Market positioning
Thanks to Nemotron-Nano-9B-V2, NVIDIA attacks programmers who need a balance of reasoning and implementation efficiency in smaller scales.
Budget control and reasoning functions are aimed at providing system builders with greater flexibility in accuracy management in comparison with the speed of response.
Their release on hugging the face and the Nvidia model indicates that they are It is to be widely available for experiments and integration.
The Nvidia Nemotron-Nano-9B-V2 edition shows further focus on performance and controlled reasoning in language models.
Combining hybrid architecture with new compression and training techniquesThe company offers programmers tools that try to keep up accuracy while reducing costs and delays.
