Researchers have found that retraining only small parts of AI models can reduce costs and prevent forgetting

Companies often state that when refine the modelsOne effective solution to make a big language model (LLM) fit for purpose and data-driven is for the model to lose some of its capabilities. Once tuned, some models “forget” the best way to perform certain tasks or other tasks they have already learned.

Research from the University of Illinois Urbana-Champaign proposes a latest method for retraining models that avoids “catastrophic forgetting,” where a model loses some of its prior knowledge. This article focuses on two specific LLMs that generate responses from images: LLaVA and Qwen 2.5-VL.

This approach encourages firms to coach only narrow parts of the LLM to avoid retraining the entire model and incurring significant increases in computational costs. The team says catastrophic forgetting is not true memory loss, but moderately a side effect of bias drift.

- Advertisement -

“Training a new LMM can cost millions of dollars and weeks of time and emit hundreds of tons of CO2, so it is an urgent problem to find ways to update existing models more efficiently and effectively,” the team wrote in paper. “Guided by this result, we are investigating tuning formulations that preserve learning while limiting changes in power output.”

The researchers focused on the multi-layer perceptron (MLP), the internal decision-making component of the model.

Catastrophic oblivion

The researchers first desired to confirm the existence and cause of catastrophic forgetting in the models.

For this purpose, a set of goal tasks to be performed by the models was created. The models were then refined and evaluated to find out whether or not they led to significant forgetting. However, as the process progressed, the researchers found that the models regained some of their abilities.

“We also noticed a surprising result: model performance decreased significantly on maintained benchmarks after training on the counting task, and mostly returned to normal on PathVQA, another specialized task that is not well represented in the benchmarks,” they said. “Meanwhile, when performing forgetting mitigation experiments, we also tried to separately tune only the self-attention projection (SA Proj) or MLP layers, motivated by the finding that tuning only the LLM was generally better than tuning the full model. This led to another very surprising result – that tuning only the self-attention projection layers led to very good target learning tasks without any drop in task performance, even after training all five task goals sequentially.”

The researchers said they believe that “what appears to be forgetting or interference after fine-tuning a narrow goal task is actually a bias in the performance distribution resulting from a change in task distribution.”

Narrow retraining

This discovery turned out to be the key to the experiment. The researchers noted that tuning the MLP increases the likelihood of “generating numerical tokens and a highly correlated decline in task accuracy.” It showed that a model forgetting some of its knowledge is only a temporary issue, not a long-term one.

“To avoid biasing the output signal distribution, we tune the MLP gating/up projections while keeping the down projection frozen, and find that this allows for learning similar to full MLP tuning, with little forgetting,” the researchers said.

This allows for a simpler and more repeatable method of tuning the model.

By focusing on a narrow segment of the model moderately than wholesale retraining, enterprises can reduce computational costs. It also allows for higher control of output drift.

However, research has focused on only two models, particularly those related to vision and language. The researchers noted that as a result of limited resources, they were unable to conduct the experiment with other models.

However, their findings can be prolonged to other LLMs, especially across different modalities.

Active US investors were busy cutting checks in October

From Air Force officer to director general of space defense: why even Rogers left to build weapons for orbit

Cluely’s Roy Lee suggests that viral hype isn’t enough

Replika founder raises $20 million in pre-release content for Wabi, the ‘YouTube app’

Tech makers are piling up huge bets on startups even as appetite for mergers and acquisitions wanes

Heavy equipment rental: historically and currently a profitable business

Top upcoming overseas markets for business investment

Transforming complex science into clear insights for growing businesses

Exclusive: Cambio raises $18M at $100M valuation for AI-powered commercial real estate software

How entrepreneurs recover from life events without burning out

From asking to offering: the mindset shift every founder needs

4 Strategies to Become a Category Creator

One book every new business owner should read

Why perfectionism delays your startup and how to think about it

4 things I will do differently when I start my next company

China has achieved the highest level of startup funding in Asia for over 3 years

February Summary: A surge in funding activity gives us insight into the future direction of startups

Top 10 funding rounds of the week: Artificial intelligence, robotics and e-commerce top the list

10 Biggest Funding Rounds This Week: World Labs Leads Another AI-Powered Lineup

Seed funding hasn’t stopped, but it’s growing and more competitive than ever, according to Crunchbase data

Researchers have found that retraining only small parts of AI models can reduce costs and prevent forgetting

Catastrophic oblivion

Narrow retraining

Latest Posts

Exclusive: Juno, a CPA-founded startup that aims to make tax returns...

China has achieved the highest level of startup funding in Asia...

Artificial intelligence delivers a second consecutive quarter of financial gains for...

The founder’s dilemma in the age of artificial intelligence: efficiency, decency,...

Artificial intelligence delivers a second consecutive quarter of financial gains for...

The new framework allows AI agents to rewrite their own skills...

10 Biggest Funding Rounds This Week: World Labs Leads Another AI-Powered...

Small and mid-sized startup purchases are still well below their 2021...

Recomended

Exclusive: Juno, a CPA-founded startup that aims to make tax returns less painful with artificial intelligence, raises $12 million

China has achieved the highest level of startup funding in Asia for over 3 years

Artificial intelligence delivers a second consecutive quarter of financial gains for Europe as transaction volumes plummet

The founder’s dilemma in the age of artificial intelligence: efficiency, decency, culture

What I learned from analyzing 789 ‘Shark Tank’ pitches: Narcissists get funded if they aren’t arrogant or defensive

Heavy equipment rental: historically and currently a profitable business