Another day at the end of 2025 and one other impressive result for a Chinese company in the field of open-source artificial intelligence.
A Chinese social networking company Weibo’s artificial intelligence division recently released the open-source VibeThinker-1.5B software— a 1.5 billion-parameter LLM model, an improved variant of a competing Chinese technology company Qwen2.5-Math-1.5B by Alibaba.
It is now available for free download and use by researchers and developers in the enterprise – even for industrial purposes – under the MIT permissive license for Face Hugging, GitHub AND Model scopeWith technical report on the open access scientific publishing website arxiv.org.
And yet, despite its small size, VibeThinker-1.5B achieves peak reasoning performance on math and coding tasks, rivaling or outperforming models lots of of times larger, and even outperforming Chinese rival DeepSeek’s famous R1 that went viral earlier this 12 months – a 671-billion-parameter model – in a formal benchmark.
It further outshines Mistral AI’s Medium bus and holds its own against Anthropic’s Claude Opus 4 and OpenAI’s gpt-oss-20B Medium, all while requiring a fraction of the infrastructure and investment.
It also does this after post-training with a budget of just $7,800 in compute resources (3,900 GPU hours on an Nvidia H800) — much lower than the tens or even lots of of hundreds of dollars typically required to tune models of similar or larger scale.
Recall, nevertheless, that this is not the total cost of developing the model: LLMs are trained in stages. First comes pre-training, during which the model learns the basic structure of the language and general knowledge by predicting the next word in huge amounts of text from the Internet, books and articles. This gives him fluency but little sense of how you can follow instructions or carry on a conversation
This is followed by post-training, which uses much smaller, higher-quality data sets – typically collections of sample questions, prompts, and answers written by experts – to show the model how you can respond helpfully, argue issues, and adapt to human expectations. Nevertheless, the cost-effectiveness of Weibo after training on VibeThinker-1.5B is noteworthy and deserves praise.
The open source version upends assumptions about parameter scale, computational intensity, and minimum possible size for high-performance LLMs.
Another approach to training: converting spectrum into a signal
VibeThinker-1.5B owes its performance to not scale, but to the training platform behind it: the principle of spectrum-to-signal conversion (SSP).
Instead of optimizing the model solely for the correctness of a single response (Pass@1), the SSP framework separates supervised tuning (SFT) and reinforcement learning (RL) into two distinct phases with different goals:
-
SFT (“Spectrum Phase”): The model is trained to maximise the variety of potential correct answers, which improves its Pass@K rating. This creates a big selection of credible solution paths.
-
RL (“Signal Phase”): A second-stage reinforcement learning system (called MaxEnt-Guided Policy Optimization, or MGPO for short) is used to discover and reinforce the most correct paths from this diverse pool of solutions. MGPO prioritizes problems where the model is most uncertain, using entropy-based weighting to focus learning.
The authors argue that this separation allows small models to more effectively explore the reasoning space – achieving signal gain without relying on a huge number of parameters.
VibeThinker-1.5B presents compelling evidence that the industry’s reliance on parameter scaling as the only path to raised inference performance could also be obsolete.
By adopting a diversity-first training process, WeiboAI has shown that smaller, more accessible models can match and even outperform billion-dollar systems on logic-intensive tasks.
Low resource consumption is one of the most vital elements of VibeThinker-1.5B. At lower than $8,000, the post-training cost is 30-60 times lower in comparison with models like the DeepSeek R1 and MiniMax-M1, which cost from $294,000 to coach. as much as 535 thousand dollars.
Performance across domains
Despite its small size, VibeThinker-1.5B provides cross-domain reasoning that outperforms many larger open source and industrial models:
|
Model |
AIME25 |
LiveCodeBench v6 |
GPQA-diamond |
|
VibeThinker-1.5B |
74.4 |
51.1 |
46.7 |
|
GPT-OSS-20B-medium |
72.1 |
54.9 |
66.0 |
|
Close job 4 |
69.2 |
56.6 |
79.6 |
|
MiniMax M1 (456B) |
74.6 |
62.3 |
69.2 |
|
DeepSeek R1 (671B) |
70.0 |
65.9 |
71.5 |
|
Kimi K2 (1.09T) |
49.5 |
53.7 |
75.1 |
VibeThinker was in comparison with each reasoning-centric (Magistral, Claude, OpenAI o3-mini) and non-reasoning LLM (GPT-4.1, Kimi K2, DeepSeek V3) models. In structured reasoning benchmarks, the model consistently outperformed non-reasoning models, regardless of size:
-
At AIME24 (mathematics) she beat Kimi K2 (1.09T) by over 10 points (80.3 vs. 69.6).
-
In LiveCodeBench v6 it outperformed Claude Opus 4 (51.1 vs. 47.4).
-
In GPQA it scored below GPT-4.1 and Claude, but still doubled its base model (from 16.4 to 46.7).
This supports the authors’ contention that size is not the only path to reasoning ability – with proper training design, smaller models can achieve and even exceed the performance of much larger systems on targeted tasks.
Notably, it achieves comparability with lots of of times larger models in mathematics and code, although it lags in general knowledge reasoning (GPQA), where larger models retain the advantage.
This suggests a potential trade-off in specialization: while VibeThinker excels at structured logic tasks, it is less capable of broad encyclopedic recall, a known limitation of smaller architectures.
Enterprise implementation guidelines
The release includes the really useful inference settings (temperature = 0.6, top_p = 0.95, max tokens = 40960).
The model is sufficiently small to be deployed on edge devices, including mobile phones and embedded systems in vehicles, and inference costs are estimated to be 20-70 times cheaper than large models.
This positions VibeThinker-1.5B not only as a research achievement, but also as a potential basis for cost-effective, locally deployable reasoning systems.
Weibo’s strategy and market position
Weibo, launched by Sina Corporation in 2009, stays the cornerstone of China’s social media ecosystem. Often described as China’s version of X (formerly Twitter), the platform combines microblogging, multimedia content and features related to trending topics with a regulatory environment shaped by close government oversight.
Despite counting 600 million monthly energetic users (greater than twice as many as X), investors are not optimistic about the growth potential of promoting revenues in the near future, and Weibo is coping with growing competition from video platforms similar to Douyin, which are attracting younger users and increasing time spent elsewhere.
In response, Weibo has focused on creator economy monetization, live streaming and vertical video, adding tools for influencer engagement, e-commerce integration and richer analytics for brands.
The platform’s role as a digital public square also makes it subject to regulatory scrutiny. Chinese authorities proceed to exert pressure on issues ranging from content management to data security. In September 2025 Weibo was among the platforms cited in official warningsemphasizing its continued exposure to political risk.
Weibo’s commitment to AI R&D — exemplified by the release of VibeThinker-1.5B — signals a shift in ambition. In addition to being a media platform, Weibo is positioning itself as a player in the next phase of China’s AI development, leveraging its capital reserves, user behavior data and internal research capabilities to advance adjoining technical fields.
What this implies for technical decision makers in the enterprise
For engineering leaders and enterprise AI teams, the VibeThinker release has practical implications for all the pieces from orchestration pipelines to cost modeling.
A 1.5B model that outperforms larger models by 100 times on math and programming tasks doesn’t just save computational resources – it changes architectural balance. It enables LLM inference on limited infrastructure, reduces latency at the edge, and lowers the barrier to entry for applications that may otherwise require API access to closed, frontier-scale models.
This is necessary for enterprise ML leaders attempting to implement reasoning-capable agents into existing systems, or for platform owners tasked with integrating LLM into automated workflows.
This is also useful for those conducting reinforcement learning from human feedback pipelines (RLHF) or managing inference optimization in hybrid cloud environments.
The post-training model methodology – specifically the entropy-focused reinforcement learning approach – offers a roadmap for teams trying to improve smaller checkpoints slightly than relying on large-scale pre-training.
VibeThinker’s benchmark data transparency and cleansing steps also address one other emerging enterprise AI priority: auditability. Although its performance on general knowledge tests still lags behind large frontier models, its task-specific reliability makes it an attractive candidate for controlled environments where correctness is more necessary than coverage.
In short, VibeThinker-1.5B is not only a research milestone – it is a strong candidate for practical use, implementation and learning in enterprises. This suggests that a new class of compact, reasoning-optimized models are viable for enterprise applications that were previously the domain of much larger systems. For organizations attempting to balance cost, latency, interpretability and control, this is a good new option in the long, growing list of Chinese open source offerings.
