Sakana AI’s CycleQD outperforms traditional skill-intensive language model tuning methods

Sakana AI’s CycleQD outperforms traditional skill-intensive language model tuning methods


Scientists from AI section have developed a resource-efficient framework that may create a whole lot of language models specializing in different tasks. Called CycleQDthis method uses evolutionary algorithms to mix the skills of various models without the need for expensive and slow training processes.

CycleQD can create swarms of task-specific agents that provide a more sustainable alternative to the current paradigm of accelerating model size.

- Advertisement -

A brand new approach to model training

Large language models (LLM) have demonstrated remarkable capabilities in a number of tasks. However, LLM training in mastering many skills stays a challenge. When tuning models, engineers must balance data across skills and be certain that one skill does not dominate the others. Current approaches often involve training increasingly larger models, resulting in increasing computational and resource requirements.

“We believe that rather than aiming to develop one large model that does all tasks well, a population-based approach that allows the evolution of a diverse swarm of niche models may offer an alternative, more sustainable path to scale up the development of AI agents with advanced capabilities,” write the researchers at Sakana in a blog post.

To create model populations, researchers were inspired by quality diversity (QD), an evolutionary computational paradigm that focuses on discovering a diverse set of solutions from an initial population sample. The goal of QD is to create specimens with different “behavior traits” (BC) that represent different skill domains. It achieves this through evolutionary algorithms (EAs) that select parental examples and use crossover and mutation operations to create recent samples.

Quality Variety (Source: Sakana AI)

CycleQD

CycleQD incorporates QDs into a post-training LLM program to assist them acquire recent, complex skills. CycleQD is useful when you have many small models that have been tailored to very specific skills, equivalent to coding or performing operations on databases and operating systems, and you need to create recent variants that have different mixtures of those skills.

Within CycleQD, each of those skills is considered a behavioral feature or quality for which the next generation of models is optimized. In each generation, the algorithm focuses on one specific skill as a quality metric, while using other skills as BCs.

“This will ensure that each skill gets its moment in the spotlight, making LLMs more balanced and overall capable,” the researchers explain.

CycleQD

CycleQD starts with a set of LLM experts, each specializing in one skill. The algorithm then applies crossover and mutation operations so as to add recent, higher quality models to the population. Crossover combines features of two parent models to create a recent model, while mutation makes random changes to the model to explore recent possibilities.

The crossover operation is based on model fusion, a technique that mixes the parameters of two LLMs to create a recent model with combined skills. This is a cost-effective and fast method for creating comprehensive models without the need for fine-tuning.

Mutation operation uses singular value decomposition (SVD), a factorization method that decomposes any matrix into simpler elements, making its elements easier to know and manipulate. CycleQD uses SVD to divide model skills into core components or sub-skills. By adapting these sub-skills, the mutation process creates models that explore recent possibilities beyond these parent models. This helps models avoid getting stuck in predictable patterns and reduces the risk of overfitting.

CycleQD performance evaluation

Researchers applied CycleQD to a set of Lamy 3-8B expert models tailored for coding, database operations, and operating system operations. The goal was to see if the evolutionary method could mix the skills of the three models to create a higher model.

The results showed that CycleQD outperforms traditional model tuning and fusion methods on the evaluated tasks. It is value noting that the model fine-tuned on all datasets combined performed only barely higher than the single-skill expert models, despite being trained on a larger amount of information. Moreover, the traditional training process is much slower and dearer. CycleQD was also capable of create different models with different performance levels for the goal tasks.

“These results clearly demonstrate that CycleQD outperforms traditional methods, proving its effectiveness in training LLM individuals to excel in multiple skills,” the researchers write.

CycleQD vs other methods

The researchers consider that CycleQD could enable lifelong learning in AI systems, allowing them to repeatedly develop, adapt and accumulate knowledge over time. This could have direct implications for real-world applications. For example, CycleQD may be used to repeatedly mix the skills of expert models moderately than training a large model from scratch.

Another exciting direction is the development of multi-agent systems, in which swarms of specialised agents evolving with CycleQD can cooperate, compete and learn from each other.

“From scientific discoveries to solving real-world problems, swarms of specialized agents could redefine the boundaries of artificial intelligence,” the researchers write.

Latest Posts

Advertisement

More from this stream

Recomended