
You have prunaThe European startup, which is working on compression algorithms for AI models, creates its optimization framework Open Source on Thursday.
Pruns AI creates frames that use several performance methods, equivalent to buffering, pruning, quantization and distillation, to a given AI model.
“We also standardize saving and charging compressed models, using combinations of these compression methods, and we also evaluate the compressed model after its compression,” said Techcrunch, co-financing of Pruna and Cto John Rachwan.
In particular, the AI Pruns can assess whether there is a significant lack of quality after squeezing the model and the advantages obtained.
“If I had to use a metaphor, we are similar to how to hug the face normalized transformers and diffusers – how to call them, how to save them, load them, etc. We do the same, but in the case of performance methods,” he added.
Large AI laboratories already use various compression methods. For example, OpenAI was in distillation to create faster versions of its flagship models.
This is likely how Opeli has developed GPT-4 Turbo, a faster GPT-4 version. Similarly Stream 1 The image generation model is a distilled version of the stream model.1 from Black Forest Labs.
Distillation is a technique used to extract knowledge from a large AI model with the “teacher-study” model. Developers send requests to the teacher’s model and register the results. The answers are sometimes compared with a set of knowledge to see how accurate they are. These results are then used to coach the student model, which is trained to familiarize the teacher’s behavior.
“In the case of large companies, they usually do the fact that they build these things on their own. And what can be found in the world of Open Source is usually based on individual methods. For example, let’s say one quantization method for LLM or one method of bouffling diffusion models,” said Rachwan. “But you can’t find a tool that aggregates them all, makes them easy to use and connect. And this is a great value that Pruns brings now.”
While Pruns AI supports any models, from large language models to diffusion models, speech models for text and computer vision models, the company focuses more in detail on image generation and video generation models.
Some of the existing users of Pruna AI to Scenario AND Fabric. In addition to the Open Source edition, Pruna AI has an enterprise offer with advanced optimization functions, including an optimization agent.
“The most exciting feature that we will soon release will be a compression agent,” said Rachwan. “You basically give him your model, you say:” I need a higher speed, but don’t surrender my accuracy by greater than 2%. ” So the agent will simply find his magic for you.
Pruns and fees for an hour for your PRO version. “This is similar to how you would think about GPU when you rent a graphic processor on AWS or any service in the cloud,” said Rachwan.
And if your model is a key a part of the AI infrastructure, you’ll finally save a lot of cash on applying with an optimized model. For example, Pruns AI has made the LAMA model eight times smaller without too much loss using a compression frame. Pruna AI hopes that his clients will think about their compression as an investment that can repay.
A few months ago, Pruns AI raised $ 6.5 million in funds. Investors in the startup are EQT Ventures, Daphni, Motier Ventures and Kim Ventures.