
QWEN teamDepartment of the Chinese giant e-commerce Alibaba Developing a growing family of enormous language models QWEN (LLM), Introduction QWQ-32BA new 32-million reasoning model designed to enhance the efficiency of complex problem solving tasks by learning to strengthen (RL).
The model is available as open Hugging and further Modelepope Based on the Apache 2.0 license. This implies that it is available for industrial and research applications, so enterprises can immediately use it to power their products and applications (even those that are downloaded by customers).
It can be obtained for individual users through Chat Qwen.
Quan-Z was the answer of Alibaba’s answer to the original model of OPENAI O1 reasoning
QWQ, an abbreviation of Qwen-Z with questions, was first introduced by Alibab in November 2024 as a model of Open Source reasoning, which goals to competition with the O1 OpenAI review.
When launching, the model was designed to extend logical reasoning and planning by reviewing and improving its own answers during inference, a technique that made it particularly effective in mathematical and encoding tasks.
In the initial version, QWQ contained 32 billion parameters and the length of the 32,000 context, and Alibaba emphasizes its ability to raise the O1 review in mathematical comparative tests similar to Aime and mathematics, in addition to scientific tasks similar to GPQA.
Despite their strengths, early iterations QWQ struggled with the programming of comparative tests similar to Livecodebench, where OpenNai models maintained the advantage. In addition, as in the case of many emerging models of reasoning, QWQ focused on challenges, similar to mixing languages and occasional round reasoning loops.
However, Alibab’s decision to release the model based on the Apache 2.0 license assured that programmers and enterprises can freely adapt and commercialize it, distinguishing it from reserved alternatives similar to O1 Opeli.
Since the initial edition of QWQ, the landscape has quickly evolved. Restrictions of traditional LLM have change into more visible, and the scaling regulations brought decreasing returns in improving performance.
This change has fueled interest in large models of reasoning (LRMS)-the new category of AI systems that use reasoning and self-reflection of inference to extend accuracy. These include the O3 OpenAI series and the extremely successful Deepseek-R1 from the competing Chinese Deepseek laboratory, branching the Hongong High-Flyer Management quantitative evaluation company.
New report From the online analytical and research company, Simpealweb said that since the launch of R1 in January 2024, Deepseek has raised the charts to change into the most visited AI-Model model for OpenAI.
QWQ-32B, the latest iteration of Alibaba, is based on these progress through the integration of RL and the structured amount of self-mutilation, positioning it as a serious competitors in the growing field of artificial intelligence of reasoning.
Increasing performance with multi -stage reinforcement learning
Traditional instructional models often struggle with difficult tasks of reasoning, but research of the QWEN syndrome suggests that RL can significantly improve the model’s ability to resolve complex problems.
QWQ-32B is based on this concept, implementing a multi-stage RL training approach to extend mathematical reasoning, consciousness in coding and general problem solving.
The model has been compared in relation to leading alternatives, similar to Deepseek-R1, O1-Mini and Deepseek-R1-Distilled-Qwen-32B, showing competitive results, although they have fewer parameters than some of those models.

For example, while the Deepseek-R1 works with 671 billion parameters (with 37 billion activated), QWQ-32B achieves comparable performance with a much lower trace-efficiently demanding 24 GB VRAM on GPU (H100 NVIDIA have 80 GB) in comparison with greater than 1500 GB VRAM To start the full GeepSeek R1 (16 NVIDIA A100 GPU) – emphasizing the efficiency of the QWEN approach.
QWQ-32B follows the architecture of the causal language model and includes several optimizations:
- 64 transformer layers with a rope, swigl, rmsnorm and QKV attention;
- Generalized note (GQA) with 40 heads attention to queries and 8 for key couples;
- Extended length of the context of 131 072 tokens, enabling higher support for input data in a long sequence;
- Multi -stage training, including initial, supervised tuning and RL.
The RL process for QWQ-32B was made in two phases:
- Mathematics and focus coding: The model has been trained using the accuracy verifier for mathematical reasoning and the code for making tasks coding. This approach ensured that the generated answers were approved for correctness before strengthening.
- General increase in ability: In the second phase, the model received training based on prizes using general models of prize and verifiers based on rules. This stage has improved the instructions, human alignment and the agent’s reasoning without prejudice to his mathematical capabilities and coding.
What does it mean for corporate decision makers
In the case of corporate leaders-in these general directors, CTOS, IT leaders, team managers and programmers of the AI-QWQ-32B application is a potential change in the way AI can support business decisions and technical innovations.
Thanks to the possibilities of reasoning based on RL, the model can provide more accurate, structured and contextual observations, which makes it useful for cases of use, similar to automatic data evaluation, strategic planning, software development and intelligent automation.
Companies that need to implement AI solutions to resolve problems, coding, financial modeling or automation of customer support may conclude that the QWQ-32B capability is an attractive option. In addition, its availability of open weight allows organizations to tune and adapt the application model specific to the domain without reserved restrictions, which makes it a flexible alternative of AI Enterprise strategy.
The undeniable fact that it comes from the Chinese e-commerce giant can increase some concerns about safety and prejudices for some non-Chinese users, especially when using the QWEN chat interface. But as in the case of Deepseek-R1, the undeniable fact that the model is available on hugging the face for downloading and using offline and refining or retraining suggests that they may be easily overcome. And this is a real alternative to Deepseek-R1.
Early AI Energy and influential reactions
The QWQ-32B edition has already noticed the research community and development of AI, and several programmers and industry specialists divide their initial impressions on X (previously Twitter):
- Hugging Vaibhav Srivastav (@reach_vb) Illuminated QWQ-32B speed in the application due to the supplier Hyperbolic laboratoriescalling it “burning fast” and comparable to the highest level models. He also noted that the model “beats Deepseek-R1 and OpenAI O1-Mini with Apache 2.0” license.
- Publisher of AI messages and rumors Chubby (@Kimonismus) He was impressed by the performance of the model, emphasizing that the QWQ-32B sometimes exceeds Deepseek-R1, although it is 20 times smaller. “Holy Moly! Qwen cooked! “They wrote.
- Yuchen jin (@yuchenj_uw), Co -founder and Cto Hyperbol LabsIN He celebrated his release, noting the advantages of performance. “Small models are so powerful! Alibaba Qwen released QWQ-32B, a reasoning model that is overcome by Deepseek-R1 (671B) and Openai O1-Mini! “
- Another member of the face hugging team, Erik Kaunism (@erikkaum) He emphasized the ease of implementation, sharing that the model is available to implement one click in the end points of hugging the face, due to which it is available to programmers without intensive configuration.
Agency capabilities
QWQ-32B accommodates agency capabilities, enabling dynamic adaptation of reasoning processes based on environmental feedback.
To get optimal performance, the QWEN team recommends using the following application settings:
- Temperature: 0.6
- TOPP: 0.95
- Peak: Between 20-40
- Yarn scaling: Recommended to support sequences longer than 32,768 tokens
The model supports the implementation using VLLM, high bandwidth of the application structure. However, the current VLLM implementation is supported only by static scaling of yarn, which maintains a constant scaling ratio no matter the input length.
Future changes
The QWEN team considers QWQ-32B as the first step in scaling RL in order to extend the possibility of reasoning. Looking to the future, the team plans:
- Continue to review RL scaling to enhance the model’s intelligence;
- Integrate agents with RL with long -term reasoning;
- Continue developing models of foundations optimized for RL;
- Go to artificial general intelligence (Aga) through more advanced training techniques.
Thanks to the QWQ-32B, the QWNEN team positions RL as a key driver of the new generation AI models, which shows that scaling can produce highly efficient and effective reasoning systems.