Researchers in My katan labs introduced Arch-RouterA new routing model and frames designed for intelligent mapping users’ query to the most fitted model of a large language (LLM).
In the case of products that build enterprises that are based on many LLM, Arch-Router goals to resolve a key challenge: how you can direct queries to the best work model without relying on rigid logic or expensive retraining every time something changes.
Challenges related to LLM routing
As the LLM number increases, developers go from homogeneous configurations to multi -model systems that use the unique strengths of each model to specific tasks (e.g. code generation, text summary or editing images).
LLM Routing appeared as a key technique for building and implementing these systems, acting as a traffic controller, which directs every user’s inquiry to the most appropriate model.
Existing routing methods are normally divided into two categories: “Routing based on tasks”, in which queries are directed on the basis of predefined tasks and “routing based on results”, which strives for optimal balance between costs and performance.
However, routing based on tasks is struggling with unclear or changing intentions of users, especially in conversations with many phrases. On the other hand, routing based on performance is prioritizing comparative results, often omitting user preferences in the real world and poorly adapts to new models, unless it is costly tuning.
More principally, as scientists noted Katanemo Labs paper“Existing routing approaches have restrictions in the real world. They usually optimize in terms of comparative performance, while neglecting human preferences powered by subjective assessment criteria.”
Scientists emphasize the need for routing systems that “are in line with subjective human preferences, offer greater transparency and remain easily adapted as models evolves and cases of use.”
A new frame for routing straight to preferences
To deal with these restrictions, scientists offer “Routing” framework, which adapts inquiries to routing rules based on preferences defined by the user.
In this framework, users define their routing principles in natural language using the “taxonomy of domain action”. It is a two -level hierarchy that reflects the way people naturally describe the tasks, starting from the general topic (domain, corresponding to “legal” or “finance”) and narrowing a specific task (motion corresponding to “summary” or “code generation”).
Each of those rules is then associated with the preferred model, enabling programmers to make routing decisions based on real needs, not only comparative results. As the article states: “This taxonomy serves as a mental model that helps users define clear and structured routing principles.”
The routing process takes place in two stages. First, the router model adapted to the preferences accepts the user’s inquiry and full algorithm and selects the most appropriate rules. Secondly, the merger mapping function, which has chosen the rules with the designated LLM.
Since the model selection logic is separated from the rules, models might be added, removed or swapped simply by editing routing rules, without the have to retrain or modify the router itself. This separation ensures flexibility required for practical implementation, in which models and cases of use are always developing.
The alternative of rules is powered by Arch-Router, a compact 1.5B parameter language model, which is refined to routing the preferences. Arch-Router receives the user’s inquiry and a full set of descriptions of principles under assembly. Then it generates an identifier that is best matching the rules.
Since the rules are a part of the input data, the system can adapt to new or modified routes during application by learning in context and without retraining. This generative approach allows the Arch-Router to make use of his previously trained knowledge to know semantics of each queries and principles and process the entire history of conversation at the same time.
The potential of increased delay is a common problem related to the inclusion of in depth rules. However, researchers designed the arch-router to make it very efficient. “While the length of routing principles can be long, we can easily increase the arch-director context window with a minimal impact on delay,” explains Salman Paracha, co-author of paper and founder/general director of Katanymo Labs. He notes that the delay is driven primarily by the length of the exit, and for the army-oriented output is simply a short name of the routing principle, corresponding to “image_editing” or “document_creation”.
Arch-Router in motion
To build Arch-Router, scientists have refined the 1.5b parameters of the QWEN 2.5 parameters on a chosen data set of 43,000 examples. Then they tested their results against the latest reserved models from OpenAI, Anthropic and Google on 4 public data sets designed for the assessment of conversational AI systems.
The results show that Arch-Router achieves the highest overall routing results of 93.17%, exceeding all other models, including the highest reserved, on average by 7.71%. The advantage of the model increased with longer conversations, showing its strong ability to trace context in many turns.

According to Parish, in practice this approach is already used in several scenarios. For example, in Open Source coding tools, programmers use an arch-orter to direct various stages of their work flow, corresponding to “code design”, “understanding the code” and “code generation”, to LLM best suited to each task. Similarly, enterprises can direct demands to create documents to a model corresponding to Claude 3.7 Sonnet when sending images editing tasks to Gemini 2.5 Pro.
The system is also ideal “for personal assistants in various fields, in which users have a variety of tasks, from the summary of the text to the inquiries of Factoid,” said Parach, adding that “in these cases Arch-Router can help programmers unite and improve the general impressions of the user.”
These frames are integrated with BowNatoric proxy server Katanemo Labs for agents, which allows programmers to implement sophisticated rules of traffic shaping. For example, during the integration of the new LLM, the team can send a small a part of the movement for a specific routing principle to the new model, confirm its performance using internal indicators, and then definitely fully transition. The company is also working on integrating its tools with platforms to further improve this process for corporate programmers.
Ultimately, the goal is to go beyond the muted AI implementation. “Arch-Router-i arc wider-detevelopers and enterprises go from fragmentary LLM implementation to a unified politics-based system,” says Paracha. “In scenarios in which users’ tasks are varied, our framework helps to transform this task and fragmentation of LLM into a uniform impressions, thanks to which the final product seems trouble -free for the end user.”
