Sakana Ai's Treequest: WdLorek Multi-Model Teams, which exceeds individual LLM by 30%

Japanese AI laboratory Saman He introduced a recent technique that permits many large language models (LLM) to cooperate with one task, effectively creating the “Dreams Team” of AI agents. Method, called Multi-LLM AB-MCTSIt enables models to perform attempts and errors and mix their unique strengths to resolve problems that are too complex for each individual model.

In the case of enterprises, this approach is the means to develop more solid and talented AI systems. Instead of being locked in one supplier or model, firms can dynamically use the best features of varied border models, assigning appropriate artificial intelligence to the right a part of the task to realize the highest results.

- Advertisement -

The power of collective intelligence

AI Frontier models are developing rapidly. However, each model has its own separate strengths and weaknesses resulting from unique training and architecture. One can stand out in coding, and the other results in creative writing. Scientists Sakana Ai claim that these differences are not a mistake, but a feature.

“We see these prejudices and various skills not as restrictions, but as valuable resources to create collective intelligence,” scientists say in their very own Blog post. They imagine that the biggest achievements of humanity come from various teams, and AI systems can even achieve more by cooperating. “By combining their intelligence, AI systems can solve problems that are unbeatable for every single model.”

Thinking longer while inference

The recent Sakana AI algorithm is a strategy of “scaling time of inference” (also known as “scaling of the test time”), the research area, which last yr became very talked-about. While most of the focus on artificial intelligence concerns “scaling of training time” (increasing their models and training them on larger data sets), the scaling of inference time improves performance by assigning a larger variety of computing resources after training the model.

One common approach is to make use of reinforcement learning to watch models to generate longer, more detailed chain sequences (COT), as may be seen in popular models equivalent to OpenAI O3 and Deepseek-R1. Another, simpler method is a repeated sampling, in which the model receives the same prompt to generate various potential solutions much like brainstorming sessions. The work of Sakana Ai connects and develops these ideas.

“Our frames offer a smarter, more strategic version of the Best-of-N (Alias repetitive sampling),” said Venturebeat Takuya Akiba, scientist Sakan Ai and co-author of the article. “It complements reasoning techniques such as Long Cot through RL. Thanks to the dynamic selection of search strategies and the appropriate LLM, this approach maximizes the performance under the limited number of LLM connections, ensuring better results of complex tasks.”

How adaptive branched search works

The core of the recent method is an algorithm called the adaptive branching of Monte Carlo Tree Search (AB-MCTS). It enables LLM to effectively perform trials and errors by intelligent balance of two different search strategies: “searching deeper” and “wider search”. The search for deeper includes a promising answer and multiple improvement, while at the same time searching for wider means generating completely recent solutions from scratch. AB-MCTs combines these approaches, enabling the system to enhance a good idea, but also turning and trying something recent if it reaches a blind alley or discovers one other promising direction.

To achieve this, the system uses Search for Monte Carlo trees (MCTS), Decision algorithm famous by Alphago Deepmind. At every stage, AB-MCTs uses probability models to choose whether it is more strategic to enhance the existing solution or generate a recent one.

Scientists went a step further with Multi-LLM AB-MCTS, which not only decides “what” (refine vs. general), but also “which” LLM should do this. At the starting of the task, the system does not know which model is best suited for a problem. It starts with trying a sustainable mix of accessible LLM and, as progressed, he finds out which models are simpler, with time assigning them more load.

Testing “Dream Team” AI

Scientists tested their AB-MCTS system with many LLM ARC-AGI-2 benchmark. ARC (Corpus abstraction and reascing) goals to envision the human ability to resolve recent problems with visual reasoning, which hinders artificial intelligence.

The team used a combination of border models, including O4-Mini, Gemini 2.5 Pro and Deepseek-R1.

Model collective was capable of find the correct solutions for over 30% 120 test problems, which significantly exceeded any of the models operating alone. The system showed the possibility of dynamically assigning the best model for a given problem. In the tasks in which there was a clear path to resolve, the algorithm quickly identified the handiest LLM and used it more often.

AB-MCTS vs individual models (source: Sakana AI)

More impressive, the band observed cases in which models solved problems that were previously unattainable for one of them. In one case, the solution generated by the O4-Mini model was incorrect. However, the system passed this erroneous try to Deepseek-R1 and Gemini-2.5 Pro, which were able to investigate the error, correct it and finally give the right answer.

“This shows that Multi-LLM AB-MCTS can flexibly combine border models to solve previously unsolvable problems, crossing the limits of what can be achieved by using LLM as collective intelligence,” scientists write.

AB-MTCS can choose different models at different stages of problem solving (source: Sakana AI)

“In addition to the individual advantages and disadvantages of each model, the tendency to hallucinate can vary significantly,” said Akiiba. “Creating a team with a model that is less likely, it may be possible to achieve the best of both worlds: powerful logical possibilities and strong grounding. Because hallucination is a serious problem in a business context, this approach can be valuable for its allergy.”

From research to real applications

To help developers and firms use this system, Sakana AI has issued a basic algorithm as an open source name called TreequestAvailable on the basis of the Apache 2.0 license (useful for business purposes). Treequest provides a flexible API interface, enabling users to implement Multi-LM AB-MCT for their very own tasks with non-standard rating and logic.

“Although we are at an early stage of using AB-MCT to specific business-oriented problems, our research reveals significant potential in several areas,” said Akiba.

In addition to the ARC-AGI-2 comparative test, the team was capable of successfully use AB-MCT for tasks equivalent to complex algorithmic coding and improve the accuracy of machine learning models.

“AB-MCT can also be very effective in case of problems requiring iterative trials and error, such as optimization of existing software performance indicators,” said Akiiba. “For example, it can be used to automatically find ways to improve the delay in response in the internet service.”

The release of a practical Open Source tool can pave the way for a recent class of powerful and reliable AI Enterprise applications.

Daily observations in matters of business use with VB every day

If you desire to impress your boss, VB Daily is covered by you. We offer you an internal measure about what firms do with generative artificial intelligence, from regulatory changes to practical implementation, so you may share insights for the maximum roi.

Read our Privacy Policy

Thanks for the subscription. Check out more VB newsletter here.

There was a mistake.

Sakana Ai’s Treequest: WdLorek Multi-Model Teams, which exceeds individual LLM by 30%

The power of collective intelligence

Thinking longer while inference

How adaptive branched search works

Testing “Dream Team” AI

From research to real applications

Latest Posts

Recomended