The new 1.5B router model reaches 93% accuracy without expensive retraining

Researchers in My katan labs introduced Arch-RouterA new routing model and frames designed for intelligent mapping users’ query to the most fitted model of a large language (LLM).

In the case of products that build enterprises that are based on many LLM, Arch-Router goals to resolve a key challenge: how you can direct queries to the best work model without relying on rigid logic or expensive retraining every time something changes.

- Advertisement -

Challenges related to LLM routing

As the LLM number increases, developers go from homogeneous configurations to multi -model systems that use the unique strengths of each model to specific tasks (e.g. code generation, text summary or editing images).

LLM Routing appeared as a key technique for building and implementing these systems, acting as a traffic controller, which directs every user’s inquiry to the most appropriate model.

Existing routing methods are normally divided into two categories: “Routing based on tasks”, in which queries are directed on the basis of predefined tasks and “routing based on results”, which strives for optimal balance between costs and performance.

However, routing based on tasks is struggling with unclear or changing intentions of users, especially in conversations with many phrases. On the other hand, routing based on performance is prioritizing comparative results, often omitting user preferences in the real world and poorly adapts to new models, unless it is costly tuning.

More principally, as scientists noted Katanemo Labs paper“Existing routing approaches have restrictions in the real world. They usually optimize in terms of comparative performance, while neglecting human preferences powered by subjective assessment criteria.”

Scientists emphasize the need for routing systems that “are in line with subjective human preferences, offer greater transparency and remain easily adapted as models evolves and cases of use.”

A new frame for routing straight to preferences

To deal with these restrictions, scientists offer “Routing” framework, which adapts inquiries to routing rules based on preferences defined by the user.

In this framework, users define their routing principles in natural language using the “taxonomy of domain action”. It is a two -level hierarchy that reflects the way people naturally describe the tasks, starting from the general topic (domain, corresponding to “legal” or “finance”) and narrowing a specific task (motion corresponding to “summary” or “code generation”).

Each of those rules is then associated with the preferred model, enabling programmers to make routing decisions based on real needs, not only comparative results. As the article states: “This taxonomy serves as a mental model that helps users define clear and structured routing principles.”

The routing process takes place in two stages. First, the router model adapted to the preferences accepts the user’s inquiry and full algorithm and selects the most appropriate rules. Secondly, the merger mapping function, which has chosen the rules with the designated LLM.

Since the model selection logic is separated from the rules, models might be added, removed or swapped simply by editing routing rules, without the have to retrain or modify the router itself. This separation ensures flexibility required for practical implementation, in which models and cases of use are always developing.

The alternative of rules is powered by Arch-Router, a compact 1.5B parameter language model, which is refined to routing the preferences. Arch-Router receives the user’s inquiry and a full set of descriptions of principles under assembly. Then it generates an identifier that is best matching the rules.

Since the rules are a part of the input data, the system can adapt to new or modified routes during application by learning in context and without retraining. This generative approach allows the Arch-Router to make use of his previously trained knowledge to know semantics of each queries and principles and process the entire history of conversation at the same time.

The potential of increased delay is a common problem related to the inclusion of in depth rules. However, researchers designed the arch-router to make it very efficient. “While the length of routing principles can be long, we can easily increase the arch-director context window with a minimal impact on delay,” explains Salman Paracha, co-author of paper and founder/general director of Katanymo Labs. He notes that the delay is driven primarily by the length of the exit, and for the army-oriented output is simply a short name of the routing principle, corresponding to “image_editing” or “document_creation”.

Arch-Router in motion

To build Arch-Router, scientists have refined the 1.5b parameters of the QWEN 2.5 parameters on a chosen data set of 43,000 examples. Then they tested their results against the latest reserved models from OpenAI, Anthropic and Google on 4 public data sets designed for the assessment of conversational AI systems.

The results show that Arch-Router achieves the highest overall routing results of 93.17%, exceeding all other models, including the highest reserved, on average by 7.71%. The advantage of the model increased with longer conversations, showing its strong ability to trace context in many turns.

Arch-Router vs Other models (Source: ARXIV)

According to Parish, in practice this approach is already used in several scenarios. For example, in Open Source coding tools, programmers use an arch-orter to direct various stages of their work flow, corresponding to “code design”, “understanding the code” and “code generation”, to LLM best suited to each task. Similarly, enterprises can direct demands to create documents to a model corresponding to Claude 3.7 Sonnet when sending images editing tasks to Gemini 2.5 Pro.

The system is also ideal “for personal assistants in various fields, in which users have a variety of tasks, from the summary of the text to the inquiries of Factoid,” said Parach, adding that “in these cases Arch-Router can help programmers unite and improve the general impressions of the user.”

These frames are integrated with BowNatoric proxy server Katanemo Labs for agents, which allows programmers to implement sophisticated rules of traffic shaping. For example, during the integration of the new LLM, the team can send a small a part of the movement for a specific routing principle to the new model, confirm its performance using internal indicators, and then definitely fully transition. The company is also working on integrating its tools with platforms to further improve this process for corporate programmers.

Ultimately, the goal is to go beyond the muted AI implementation. “Arch-Router-i arc wider-detevelopers and enterprises go from fragmentary LLM implementation to a unified politics-based system,” says Paracha. “In scenarios in which users’ tasks are varied, our framework helps to transform this task and fragmentation of LLM into a uniform impressions, thanks to which the final product seems trouble -free for the end user.”

Daily observations in matters of business use with VB day by day

If you would like to impress your boss, VB Daily is covered by you. We provide you with an internal measure about what firms do with generative artificial intelligence, from regulatory changes to practical implementation, so you possibly can share insights for the maximum roi.

Read our Privacy Policy

Thanks for the subscription. Check out more VB newsletter here.

There was a mistake.

Active US investors were busy cutting checks in October

From Air Force officer to director general of space defense: why even Rogers left to build weapons for orbit

Cluely’s Roy Lee suggests that viral hype isn’t enough

Replika founder raises $20 million in pre-release content for Wabi, the ‘YouTube app’

Tech makers are piling up huge bets on startups even as appetite for mergers and acquisitions wanes

How entrepreneurs recover from life events without burning out

5 tips to engage Generation Z in email marketing

The pressure to start is real: why 72% of founders have mental health issues

5 questions startups should ask before implementing AI

5 email delivery tips to help you increase sales

From asking to offering: the mindset shift every founder needs

4 Strategies to Become a Category Creator

One book every new business owner should read

Why perfectionism delays your startup and how to think about it

4 things I will do differently when I start my next company

Startup funding continued to decline in November, with the number of mega rounds reaching a three-year high

German AI image generator Black Forest Labs raises $300 million at a $3.25 billion valuation as European AI funding ramps up

Funding for Edtech-specific startups remains low

Bezos launches AI startup with reported $6.2 billion in funding

10 Biggest Funding Rounds This Week: Artificial Intelligence and Defense Technologies Are Taking the Lead

The new 1.5B router model reaches 93% accuracy without expensive retraining

Challenges related to LLM routing

A new frame for routing straight to preferences

Arch-Router in motion

Latest Posts

Why AI coding agents aren’t production ready: fragile context windows, broken...

Tonight on StrictlyVC Palo Alto, the future of deep tech will...

“Truth serum” for artificial intelligence: a new OpenAI method for training...

This VC charges $0 for PR and has 12 unicorns to...

Why AI coding agents aren’t production ready: fragile context windows, broken...

“Truth serum” for artificial intelligence: a new OpenAI method for training...

AI Denial Becomes a Risk for the Enterprise: Why Ignoring “Weaknesses”...

Yes, I’m biased. Still, leading unicorns like Anthropic should be preparing...

Recomended

Why AI coding agents aren’t production ready: fragile context windows, broken refactors, lack of operational awareness

Tonight on StrictlyVC Palo Alto, the future of deep tech will be explained to you

“Truth serum” for artificial intelligence: a new OpenAI method for training models to confess errors

This VC charges $0 for PR and has 12 unicorns to show

Sources: Aaru, an artificial intelligence research startup, raises Series A value at a “principal” valuation of $1 billion

The 10 biggest financing rounds this week: Investors are back to writing big checks