Google's Nested Learning paradigm can solve the problem of AI memory and continuous learning

Researchers at Google have developed a latest artificial intelligence paradigm that goals to solve one of the biggest limitations of today’s large language models: the inability to learn or update knowledge after training. The so-called paradigm Nested learningtransforms the model and its training not as a single process, but as a system of nested, multi-level optimization problems. The researchers argue that this approach can unlock more expressive learning algorithms, leading to raised contextual learning and higher memory.

To prove their concept, the researchers used Nested Learning to develop a latest model called Hope. Preliminary experiments show it has excellent performance on language modeling, continuous learning, and long-context reasoning tasks, potentially paving the way for efficient artificial intelligence systems that can adapt to real-world environments.

The memory problem of large language models

Deep learning algorithms helped eliminate the need for careful design and specialized knowledge required in traditional machine learning. By feeding the models huge amounts of data, they may learn the mandatory representations on their very own. However, this approach presented its own set of challenges that would not be solved by simply stacking more layers or creating larger networks, resembling generalizing to latest data, continually learning latest tasks, and avoiding suboptimal solutions during training.

- Advertisement -

Efforts to beat these challenges have led to innovations that have led to Transformersbeing the basis of modern large language models (LLM). These models have initiated a “paradigm shift from task-based models to more general-purpose systems with a variety of emergent capabilities as a result of scaling ‘right’ architectures,” the researchers write. Still, a fundamental limitation stays: post-training LLMs are largely static and cannot update their core knowledge or acquire latest skills through latest interactions.

The only flexible element of the LLM is its contextual learning an ability that enables him to perform tasks based on information provided by his direct prompt. This makes current LLMs analogous to a one who cannot form latest long-term memories. Their knowledge is limited to what they learned during initial training (distant past) and what is in their current context window (immediate present). Once the conversation moves beyond the context window, this information is lost eternally.

The problem is that today’s Transformer-based LLMs do not have an ‘online’ consolidation mechanism. The information in the context window never updates the model’s long-term parameters – the weights stored in its forwarding layers. As a result, the model cannot sustainably acquire latest knowledge or skills through interaction; the whole lot learned disappears as soon as the context window moves.

A nested approach to learning

Nested learning (NL) is designed to enable computational models to learn from data using different levels of abstraction and time scales, much like the brain. It treats a single machine learning model not as one continuous process, but as a system of interconnected learning problems that are optimized concurrently at different speeds. This is a departure from the classical view that treats the model architecture and its optimization algorithm as two separate elements.

In this paradigm, the training process is viewed as developing “associative memory”, the ability to mix and recall related information. The model learns to map a data point to its local error, which measures how “surprising” that data point was. Even key architectural elements resembling the attention mechanism in transformers can be viewed as easy associative memory modules that learn mappings between tokens. By defining the update frequency for each component, these nested optimization problems can be organized into different “levels”, forming the core of the NL paradigm.

Hope for continuous learning

Researchers put these principles into practice in Hope, an architecture designed to embody embedded learning. Hope is a modified version Titansone other architecture introduced by Google in January to deal with the memory limitations of the Transformer model. Although the Titans had a powerful memory system, its parameters only updated at two different speeds: the long-term memory module and the short-term memory mechanism.

Hope is a self-modifying architecture enhanced with a “Continuum Memory System” (CMS) that permits unlimited levels of in-context learning and scaling to larger context windows. The CMS works like a series of memory banks, each updating at a different frequency. Faster updating banks process immediate information, while slower ones consolidate more abstract knowledge over longer periods. This allows the model to optimize its own memory in a self-referential loop, creating an architecture with theoretically infinite levels of learning.

Across a diverse set of language modeling and common sense reasoning tasks, Hope demonstrated lower confusion (a measure of how well the model predicts the next word in the sequence and maintains consistency in the generated text) and greater accuracy in comparison with each standard transformers and other modern reoccurrence models. Hope also performed higher on long-context “Needle in a Haystack” tasks, in which the model must find and use specific information hidden in a large volume of text. This suggests that the CMS offers a more efficient strategy to handle long sequences of information.

This is one of several attempts to create AI systems that process information at various levels. Hierarchical model of reasoning (HRM) from Sapient Intelligence used a hierarchical architecture to make the model more practical at learning reasoning tasks. A small model of reasoning (TRM), a model from Samsung, improves HRM by making changes to its architecture, improving its performance while increasing its efficiency.

While promising, Nested Learning faces some of the same challenges as other paradigms in realizing its full potential. Current AI hardware and software stacks are largely optimized for classic deep learning architectures, and Transformer models in particular. Adopting embedded learning at scale may require fundamental changes. However, if it gains momentum, it could result in much more efficient LLMs that can repeatedly learn, a critical skill for real-world enterprise applications where environments, data, and user needs are continually changing.

Active US investors were busy cutting checks in October

From Air Force officer to director general of space defense: why even Rogers left to build weapons for orbit

Cluely’s Roy Lee suggests that viral hype isn’t enough

Replika founder raises $20 million in pre-release content for Wabi, the ‘YouTube app’

Tech makers are piling up huge bets on startups even as appetite for mergers and acquisitions wanes

Heavy equipment rental: historically and currently a profitable business

Top upcoming overseas markets for business investment

Transforming complex science into clear insights for growing businesses

Exclusive: Cambio raises $18M at $100M valuation for AI-powered commercial real estate software

How entrepreneurs recover from life events without burning out

From asking to offering: the mindset shift every founder needs

4 Strategies to Become a Category Creator

One book every new business owner should read

Why perfectionism delays your startup and how to think about it

4 things I will do differently when I start my next company

China has achieved the highest level of startup funding in Asia for over 3 years

February Summary: A surge in funding activity gives us insight into the future direction of startups

Top 10 funding rounds of the week: Artificial intelligence, robotics and e-commerce top the list

10 Biggest Funding Rounds This Week: World Labs Leads Another AI-Powered Lineup

Seed funding hasn’t stopped, but it’s growing and more competitive than ever, according to Crunchbase data

Google’s Nested Learning paradigm can solve the problem of AI memory and continuous learning

The memory problem of large language models

A nested approach to learning

Hope for continuous learning

Latest Posts

Exclusive: Juno, a CPA-founded startup that aims to make tax returns...

China has achieved the highest level of startup funding in Asia...

Artificial intelligence delivers a second consecutive quarter of financial gains for...

The founder’s dilemma in the age of artificial intelligence: efficiency, decency,...

Artificial intelligence delivers a second consecutive quarter of financial gains for...

The new framework allows AI agents to rewrite their own skills...

10 Biggest Funding Rounds This Week: World Labs Leads Another AI-Powered...

Small and mid-sized startup purchases are still well below their 2021...

Recomended

Exclusive: Juno, a CPA-founded startup that aims to make tax returns less painful with artificial intelligence, raises $12 million

China has achieved the highest level of startup funding in Asia for over 3 years

Artificial intelligence delivers a second consecutive quarter of financial gains for Europe as transaction volumes plummet

The founder’s dilemma in the age of artificial intelligence: efficiency, decency, culture

What I learned from analyzing 789 ‘Shark Tank’ pitches: Narcissists get funded if they aren’t arrogant or defensive

Heavy equipment rental: historically and currently a profitable business