Inside Ring-1T: Ant Engineers Solve Reinforcement Learning Bottlenecks on the Scale of Trillions

China A bunch of antsAlibaba affiliate, detailed technical information about the latest model, Ring-1Twhich the company claims is “the first open-source reasoning model with a total trillion parameters.”

Ring-1T goals to compete with other reasoning models corresponding to GPT-5 and O-series OpenAIbut also GoogleGemini 2.5. With the release of its latest model, Ant is extending the geopolitical debate over who will make it dominate the AI race: China or USA.

Ant Group says Ring-1T is optimized for math and logic problems, code generation and scientific problem solving.

- Advertisement -

“With approximately 50 billion activated parameters per token, Ring-1T achieves state-of-the-art performance on many demanding benchmarks – despite relying solely on natural language reasoning capabilities,” Ant said in paper.

Ring-1T, which was first released in preview in September, adopts the same architecture as Ling 2.0 and is trained on the Ling-1T base model that the company released earlier this month. Ant said this permits the model to handle as much as 128,000 tokens.

To train a model as large as Ring-1T, researchers needed to develop latest methods for scaling reinforcement learning (RL).

New training methods

Ant Group has developed three “interconnected innovations” to support Ring-1T RL and training, which is a challenge given the size of the model and the typically heavy computational requirements it entails. These three are IcePop, C3PO++ and ASystem.

IcePop removes noisy gradient updates to stabilize training without slowing down inference. It helps eliminate the catastrophic misalignment of learning and inference in RL. The researchers noted that when training models, especially those using a mixture of experts (MoE) architecture corresponding to Ring-1T, there can often be discrepancies in probability calculations.

“This problem is particularly pronounced when training MoE models using RL due to the inherent use of a dynamic routing mechanism. Additionally, in the case of long CoT settings, these discrepancies may gradually accumulate in subsequent iterations and become further amplified,” the researchers say.

IcePop “bypasses unstable training updates through bilateral masking calibration.”

Another latest method that researchers needed to develop is C3PO++, an improved version of the C3PO system previously developed by Ant. The method manages how Ring-1T and other high-performance models generate and process training examples, or so-called deployments, so that GPUs don’t remain idle.

The way this works could be to divide the work inside deployments into parts that will be processed in parallel. One group is the inference pool that generates latest data, and the other is the training pool that collects results to update the model. C3PO++ creates a token budget to manage the amount of data processed, ensuring efficient use of GPUs.

The last latest method, ASystem, uses the SingleController+SPMD (Single Program, Multiple Data) architecture to enable asynchronous operations.

Benchmark results

Ant pointed to Ring-1T for benchmarks measuring performance in math, coding, logical reasoning, and general tasks. They tested it against models corresponding to DeepSeek-V3.1-Terminus-Thinking, Qwen-35B-A22B-Thinking-2507, Gemini 2.5 Pro and GPT-5 Thinking.

In benchmarks, Ring-1T performed well, rating second to OpenAI’s GPT-5 in most tests. Ant said the Ring-1T showed the best performance of all open-mass models tested.

The model achieved a rating of 93.4% on the AIME 25 leaderboard, second only to GPT-5. In encoding, Ring-1T outperformed each DeepSeek and Qwen.

“This indicates that our carefully synthesized dataset shapes the solid performance of Ring-1T in software applications, which provides a solid foundation for future agent application efforts,” the company said.

Ring-1T shows how much Chinese firms invest in models

Ring-1T is the latest model from China, which is expected to dethrone GPT-5 and Gemini.

Since the surprise launch of DeepSeek in January, Chinese firms have been releasing impressive models at a rapid pace. parent company Ant, Alibabarecently released Qwen3 – Omnia multimodal model that natively unifies text, image, audio and video. DeepSeek also continues to refine its models and earlier this month launched DeepSeek-OCR. This latest model reimagines the way models process information.

As Ring-1T and Ant develop latest methods for training and scaling very large models, the battle for AI supremacy between the U.S. and China continues to accentuate.

Active US investors were busy cutting checks in October

From Air Force officer to director general of space defense: why even Rogers left to build weapons for orbit

Cluely’s Roy Lee suggests that viral hype isn’t enough

Replika founder raises $20 million in pre-release content for Wabi, the ‘YouTube app’

Tech makers are piling up huge bets on startups even as appetite for mergers and acquisitions wanes

How entrepreneurs recover from life events without burning out

5 tips to engage Generation Z in email marketing

The pressure to start is real: why 72% of founders have mental health issues

5 questions startups should ask before implementing AI

5 email delivery tips to help you increase sales

From asking to offering: the mindset shift every founder needs

4 Strategies to Become a Category Creator

One book every new business owner should read

Why perfectionism delays your startup and how to think about it

4 things I will do differently when I start my next company

Startup funding continued to decline in November, with the number of mega rounds reaching a three-year high

German AI image generator Black Forest Labs raises $300 million at a $3.25 billion valuation as European AI funding ramps up

Funding for Edtech-specific startups remains low

Bezos launches AI startup with reported $6.2 billion in funding

10 Biggest Funding Rounds This Week: Artificial Intelligence and Defense Technologies Are Taking the Lead

Inside Ring-1T: Ant Engineers Solve Reinforcement Learning Bottlenecks on the Scale of Trillions

New training methods

Benchmark results

Ring-1T shows how much Chinese firms invest in models

Latest Posts

Why AI coding agents aren’t production ready: fragile context windows, broken...

Tonight on StrictlyVC Palo Alto, the future of deep tech will...

“Truth serum” for artificial intelligence: a new OpenAI method for training...

This VC charges $0 for PR and has 12 unicorns to...

Why AI coding agents aren’t production ready: fragile context windows, broken...

“Truth serum” for artificial intelligence: a new OpenAI method for training...

AI Denial Becomes a Risk for the Enterprise: Why Ignoring “Weaknesses”...

Yes, I’m biased. Still, leading unicorns like Anthropic should be preparing...

Recomended

Why AI coding agents aren’t production ready: fragile context windows, broken refactors, lack of operational awareness

Tonight on StrictlyVC Palo Alto, the future of deep tech will be explained to you

“Truth serum” for artificial intelligence: a new OpenAI method for training models to confess errors

This VC charges $0 for PR and has 12 unicorns to show

Sources: Aaru, an artificial intelligence research startup, raises Series A value at a “principal” valuation of $1 billion

The 10 biggest financing rounds this week: Investors are back to writing big checks