Despite the intense AI arms race, a multimodal future awaits us

Despite the intense AI arms race, a multimodal future awaits us


Every week – sometimes every day – a recent, cutting-edge artificial intelligence model is born into the world. As we enter 2025, the pace of recent model releases is dizzying, if not exhausting. The rollercoaster curve continues to grow exponentially, and exhaustion and wonder have develop into constant companions. Each release highlights why a particular model is higher than all the others, with an countless supply of benchmarks and bar charts filling our feeds as we try to maintain up.

Eighteen months ago, the overwhelming majority of developers and firms were using a single AI model. Today it’s the other way around. It is rare to come across a large-scale business that is limited to the capabilities of one model. Companies are concerned about vendor lock-in, especially when it involves technology, which has quickly develop into a key element of each long-term corporate strategy and short-term revenues. Putting all of your assumptions on one large language model (LLM) is becoming more and more dangerous for teams.

- Advertisement -

Yet despite this fragmentation, many model providers still espouse the view that AI might be a winner-takes-all marketplace. They argue that the expertise and computation required to coach best-in-class models is rare, defensible, and self-reinforcing. From their perspective, the bubble around building artificial intelligence models will eventually collapse, leaving one giant artificial general intelligence (AGI) model that might be used for every thing. Exclusively owning such a model would mean being the strongest company in the world. The size of this bounty has began an arms race towards more and more GPUs, with a recent zero being added to the number of coaching parameters every few months.

We consider this view is unsuitable. There might be no single model that may rule the universe, neither next 12 months nor the next decade. Instead, the future of artificial intelligence might be multi-model.

Language models are fuzzy commodities

It defines a commodity as “a standardized commodity that is bought and sold on a large scale and whose units are interchangeable.” Language models are commodities in two vital senses:

  1. The models themselves are becoming increasingly interchangeable for a broader set of tasks;
  2. The scientific knowledge required to create these models is becoming more and more distributed and accessible, with pioneering labs barely staying ahead of each other and independent researchers in the open source community hot on their heels.

While language models are becoming commodified, it is happening unevenly. There is a large core of capabilities in which each model, from GPT-4 all the option to Mistral Small, is perfectly suited to support. At the same time, as we move towards marginal and edge cases, we see increasing diversity, with some model providers clearly specializing in code generation, inference, search-assisted generation (RAG), or mathematics. This results in countless hand-wringing, Reddit searching, evaluating, and tuning to search out the right model for each task.

Thus, although language models are commodities, they will more accurately be described as. In many cases, AI models might be almost interchangeable, and the selection of model will depend on metrics resembling price and latency. However, at the limit of possibility, the opposite will occur: models will proceed to specialize, becoming more and more differentiated. For example, Deepseek-V2.5 is stronger than GPT-4o in C# encoding, regardless that it is a fraction of the size and 50 times cheaper.

Both of those dynamics – commoditization and specialization – disprove the concept that a single model might be best suited to support every possible use case. Rather, they point to an increasingly fragmented AI landscape.

Multimodal orchestration and routing

There is an apt analogy to the market dynamics of language models: the human brain. The structure of our brains has remained unchanged for 100,000 years, and our brains are much more similar than different. For the overwhelming majority of our time on Earth, most individuals learned the same things and had similar opportunities.

But then something modified. We have developed the ability to speak linguistically – first in speech, then in writing. Communication protocols make networking easier, and as people began to attach on networks, we also began to specialize more and more. We have freed ourselves from the burden of getting to be generalists in all areas, of being self-sufficient islands. Paradoxically, the collective wealth of specialization also signifies that the average person today is a much stronger generalist than any of our ancestors.

In a sufficiently wide input space, the universe all the time tends to specialize. This is true in all places, from molecular chemistry to biology to human society. Given enough diversity, distributed systems will all the time be more computationally efficient than monoliths. We consider the same might be true for artificial intelligence. The more we are able to leverage the strengths of multiple models reasonably than relying on just one, the more these models can specialize, expanding the boundaries of what is possible.

An increasingly vital pattern for leveraging the strengths of various models is routing – dynamically querying the best-fitting model, while using cheaper, faster models when this does not result in any degradation in quality. Routing allows us to make the most of all the advantages of specialization – greater accuracy at lower cost and latency – without sacrificing the robustness of generalization.

A straightforward demonstration of the power of routing will be seen in the indisputable fact that most of the best models in the world are routers themselves: they are built using A mixture of experts architectures that route each subsequent generation of tokens to dozens of expert submodels. If it is true that LLMs are spreading exponentially, fuzzy commodities, then routing must develop into an essential a part of any AI stack.

There is a view that LLMs will plateau as they achieve human intelligence – that as capabilities develop into fully saturated, we are going to coalesce around one general model in the same way we have coalesced around AWS or the iPhone. None of those platforms (or their competitors) have increased their capabilities 10x in the last few years, so we’d as well be comfortable in their ecosystems. However, we consider that artificial intelligence is not going to stop at human-level intelligence; it’s going to proceed far beyond any limits we are able to even imagine. As it does so, it’s going to develop into increasingly fragmented and specialized, just as any other natural system would develop into.

We cannot overemphasize how fragmenting the AI ​​model is a superb thing. Fragmented markets are efficient markets: they empower buyers, maximize innovation, and minimize costs. To the extent that we are able to leverage networks of smaller, more specialized models, reasonably than sending every thing through the internals of one giant model, we are moving toward a much safer, easier to interpret, and easier to manage future of artificial intelligence.

The biggest inventions have no owners. Ben Franklin’s heirs have no electricity. The Turing estate does not own all computers. Artificial intelligence is undoubtedly one of humanity’s biggest inventions; we consider that its future might be – and must be – multi-model.

Data decision makers

Welcome to the VentureBeat community!

DataDecisionMakers is a place where experts, including data scientists, can share data-related insights and innovations.

If you must read about progressive ideas and current information, best practices and the future of information and data technologies, join us at DataDecisionMakers.

You might even consider writing your personal article!

Latest Posts

Advertisement

More from this stream

Recomended