Flow claims it can increase the power of any processor 100x with its accompanying chip and a little grease

Flow claims it can increase the power of any processor 100x with its accompanying chip and a little grease

A Finnish startup called Flow calculation makes one of the strangest claims ever heard in silicon engineering: by adding its own companion chip, any processor can immediately double its performance, increasing it by as much as 100-fold due to software improvements.

If it works, it could help the industry sustain with the insatiable demand for computing power from AI developers.

- Advertisement -

Flow is an extension VTT, a Finnish state-backed research organization that is a bit like a national laboratory. The commercialized chip technology, which she called Parallel Processing Unit, is the result of research conducted in this laboratory (although the investor is VTT, the mental property belongs to Flow).

This claim, which Flow is the first to confess, is ridiculous on its face. You can’t just magically squeeze extra performance out of processors across different architectures and code bases. If so, Intel, AMD or anyone else would have done it years ago.

But Flow was working on something that was theoretically possible – except no one had managed to do it.

Central processing units have come a good distance since the early days of vacuum tubes and punched cards, but in some fundamental ways they are still the same. Their major limitation is that, as serial quite than parallel processors, they can only perform one thing at a time. Of course, they switch this a billion times a second across multiple cores and lanes – but all of these methods take into account the single-band nature of the processor. (In contrast, a GPU performs many related computations concurrently, but specializes in specific operations.)

“The processor is the weakest link in computers,” said Flow co-founder and CEO Timo Valtonen. “It’s not doing its job and it needs to change.”

Processors have turn into very fast, but even with nanosecond response times, there is a huge amount of waste in the way instructions are executed simply because of the fundamental limitation that one task should be accomplished before the next one can begin. (I’m simplifying here because I’m not a chip engineer myself.)

Flow claims it did this by removing this limitation, turning the processor from a single-lane street into a multi-lane highway. The processor is still limited to performing one task at a time, but PPU Flow, as they call it, essentially manages traffic on a nanosecond scale to maneuver tasks to and from the processor faster than was previously possible.

Think of the processor as a chef working in the kitchen. A chef can only work so fast, but what if that person had a superhuman assistant who could replace the knives and tools at the chef’s hands, clean up the prepared food and add latest ingredients, removing all the tasks that are not the chef’s responsibility? The chef still only has two hands, but now he can work 10 times faster.

Graph (in the journal, note) showing improvements to the FPGA-enhanced PPU in comparison with unmodified Intel chips. Increasing the number of PPU cores repeatedly improves performance.
Image credits: Flow calculation

It’s not a perfect analogy, but it gives you an idea of ​​what is going on on here, at least based on Flow’s internal testing and demos with the industry (and they confer with everyone). The PPU does not increase the clock speed or otherwise impact the system in any way that may lead to additional heat or power output; in other words, the chef is not asked to cut twice as fast. It simply uses the CPU cycles that are already happening more efficiently.

This type of thing is not entirely latest, Valtonen says. “This has been researched and discussed in a high-level academic setting. You can already do parallelization, but it breaks the legacy code and then it is unusable.

So it might be done. It simply couldn’t be refrained from rewriting all the code in the world from scratch, which makes it useless. Another Nordic computing company, ZeroPoint, solved a similar problem by achieving high levels of memory compression while maintaining data transparency with the rest of the system.

In other words, Flow’s big achievement is not managing traffic quickly, but quite doing it without having to change any code on any processor or architecture under test. The claim that any code can be executed twice as fast on any chip, with no modifications aside from integrating the PPU into the die, sounds a bit unbelievable.

Therein lies the major challenge to Flow’s success as a company: unlike software, Flow’s technology should be incorporated at the chip design level, which implies it is not retroactive and the first PPU chip would necessarily be quite down the road. Flow has shown that the technology works in FPGA-based test setups, but chipmakers would have to commit quite a bit of resources to see the advantages claimed.

Flow founding team, from left: Jussi Roivainen, Martti Forsell and Timo Valtonen.
Image credits: Flow calculation

The scale of these advantages and the proven fact that processor improvements over the past few years have been repetitive and fractional, nevertheless, may make chipmakers knock on Flow’s door quite urgently. If you can truly double performance in one generation with one chip change, it’s a no-brainer.

Further performance gains come from refactoring and recompiling the software to work higher with the PPU-CPU combination. Flow claims to have seen increases of as much as 100x in code that has been modified (though not necessarily fully rewritten) to leverage the company’s technology. The company is working to supply recompilation tools to make this easier for software developers who need to optimize for Flow-enabled chips.

Analyst Kevin Krewell from Tirias’ researchwho was briefed on Flow technology and described as having an outside perspective on these issues, was more concerned about industry adoption than the fundamentals.

He quite rightly noted that AI acceleration is currently the biggest market that can be targeted with specialty silicon corresponding to Nvidia’s popular H100. While a PPU-accelerated processor would bring overall advantages, chipmakers may not need to rock the boat too much. The only query is whether these corporations are willing to take a position significant resources in largely unproven technology when they likely have a five-year plan that such a selection would derail.

Will Flow technology turn into a must-have for every chipmaker, catapulting them to fortune and fame? Or will penny-pinching chipmakers resolve to remain the course and proceed to gather rent from the ever-growing computing market? Probably somewhere in the middle – but what’s telling is that although Flow has achieved a significant engineering feat here, like all startups, the company’s future depends on its customers.

Flow is just coming out of hiding, due to €4 million (roughly $4.3 million) in seed funding led by Butterfly Ventures with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland.

Latest Posts

Advertisement

More from this stream

Recomended