MiniMax is probably best known in the US today as the Singaporean company behind Hailuo, a high-resolution, realistic, generative AI video model that competes with Runway, Sora OpenAI and Dream Machine Luma AI.
But the company has much more up its sleeve: today, for example, it announced the release and open-source availability of its software MiniMax-01 seriesa recent family of models designed to handle very long contexts and streamline the development of AI agents.
The series includes MiniMax-Text-01, a basic large language model (LLM), and MiniMax-VL-01, a visual multimodal model.
Huge context window
Of particular note is MiniMax-Text-o1, which allows as much as 4 million tokens to be included in the context window – reminiscent of value of books in a small library. The context window shows how much information LLM can handle one input/output exchangewith words and concepts represented as numerical “tokens”, an internal mathematical abstraction of the data on which the LLM was trained.
And while Google previously led the pack with the Gemini 1.5 Pro i Context window with a capability of two million tokensMiniMax has significantly doubled this.
As MiniMax posted on X’s official account today: “MiniMax-01 efficiently processes as much as 4 million tokens – 20 to 32 times greater than other leading models. We consider MiniMax-01 will have the option to support the anticipated growth in agent applications in the coming 12 months as agents increasingly require enhanced context and persistent memory capabilities.
The models are now available for download on the website Face Hugging AND GitHub under a custom MiniMax licenseso that users can try them on directly Hailuo AI Chat (ChatGPT/Gemini/Claude competitor) and via MiniMax application programming interface (API)where external developers can connect their own unique applications to them.
MiniMax offers text processing and multimodal APIs at competitive prices:
- USD 0.2 per 1 million input tokens
- $1.1 per 1 million output tokens
For comparison, the costs of OpenAI’s GPT-4o USD 2.50 for 1 million input tokens via API, an astonishing 12.5 times costlier.
MiniMax has also integrated a structure of experts (MoE) with 32 experts to optimize scalability. This design balances compute and memory performance while maintaining competitive performance in key benchmarks.
Take your striking to a recent level with Lightning Attention Architecture
The heart of MiniMax-01 is the Lightning Attention mechanism, an modern alternative to transformer architecture.
This design significantly reduces computational complexity. The models consist of 456 billion parameters, 45.9 billion of which are activated per inference.
Unlike previous architectures, Lightning Attention uses a mixture of linear and traditional SoftMax layers, achieving near-linear complexity for long inputs. SoftMaxfor those like me who are recent to this idea, it involves transforming the input numbers into probabilities that sum to 1 so that LLM can approximate which meaning of the input is more than likely to occur.
MiniMax has rebuilt its training and inference platforms to support the Lightning Attention architecture. Key improvements include:
- MoE comprehensive communication optimization: Reduces the communication overhead between GPUs.
- Varlen identified: Minimizes computational loss when processing long sequences.
- Efficient kernel implementations: Customized CUDA kernels improve Lightning Attention performance.
These advancements make the MiniMax-01 models available for real-world applications while maintaining affordability.
Performance and benchmarks
For mainstream texts and multimodal benchmarks, MiniMax-01 can compete with leading models comparable to GPT-4 and Claude-3.5, performing particularly well in long-context evaluations. It is value noting that MiniMax-Text-01 achieved 100% accuracy on The “Needle in the Haystack” task. with a context containing 4 million tokens.
The models also show minimal performance degradation as the input signal length increases.
MiniMax plans regular updates to expand model capabilities, including code and multimodal improvements.
The company sees opensourcing as a step toward building foundational AI capabilities for the evolving AI agent landscape.
As 2025 is expected to be a transformational 12 months for AI agents, the need for persistent memory and efficient communication between agents is growing. MiniMax innovations are designed to satisfy these challenges.
Open to cooperation
MiniMax invites developers and researchers to explore the possibilities of MiniMax-01. In addition to open source, his team welcomes technical suggestions and collaboration requests at model@minimaxi.com.
With its commitment to cost-effective and scalable AI, MiniMax positions itself as a key player in shaping the era of AI agents. The MiniMax-01 series offers developers an exciting opportunity to push the boundaries of what artificial intelligence can achieve in the long term.