The introduction of ChatGPT led to the widespread use of huge language models (LLM) in each technology and non-technology industries. This popularity is mainly on account of two aspects:
- LLM as a Knowledge Store: LLMs are trained based on a huge amount of web data and are updated at regular intervals (i.e. GPT-3, GPT-3.5, GPT-4, GPT-4o and others);
- Emerging capabilities: emerge as the LLM grows abilities not found in smaller models.
Does this mean that we have already achieved human-level intelligence that we call artificial general intelligence (AGI)? Gartner defines AGI as a type of artificial intelligence with the ability to grasp, learn and apply knowledge across a big selection of tasks and domains. The road to AGI is long, and one of the key obstacles is the autoregressive nature of LLM training, which predicts words based on past sequences. As one of the pioneers of artificial intelligence research, Yann LeCun emphasizes that the LLM may deviate from accurate answers on account of their auto-regressive nature. Therefore, LLMs have several limitations:
- Limited Knowledge: LLMs, trained on massive data, do not have up-to-date world knowledge.
- Limited reasoning: LLMs have limited reasoning ability. As Subbarao Kambhampati points out LLMs are good knowledge retrievers, but they are not good thinkers.
- Lack of dynamics: LLMs are static and do not have access to real-time information.
To overcome the challenges faced by LLM, a more advanced approach is required. This is where agents grow to be crucial.
Agents to the rescue
Concept intelligent agent in AI it has evolved over two many years and implementations have modified over time. Nowadays, agents are talked about in the context of LLM. Simply put, an agent is like a Swiss Army knife for LLM challenges: it can help us reason, provide a means to get up-to-date information from the Internet (solving dynamics problems in LLM), and can get the job done on its own. With LLM as a framework, an agent formally includes tools, memory, reasoning (or planning), and motion components.
Components of AI agents
- Tools allow agents to access external information – whether from the Internet, databases or APIs – enabling them to assemble the obligatory data.
- Memory can be short-term or long-term. Agents use notebook memory to temporarily store results from various sources, and chat history is an example of long-term memory.
- Reasoning allows agents to think methodically, breaking down complex tasks into more manageable subtasks for efficient processing.
- Actions: Agents perform actions based on their environment and reasoning, adapting and solving tasks iteratively based on feedback. ReAct is one of the common methods for iteratively performing reasoning and motion.
What are agents good at?
Agents excel at complex tasks, especially in the environment role playing mode, profiting from the increased efficiency of the LLM. For example, when writing a blog, one agent may focus on research while one other deals with writing – each agent might be responsible for: specific partial goal. This multi-agent approach is applicable to many real-world problems.
Role-playing helps agents focus on specific tasks to attain larger goals, significantly reducing hallucinations defining parts prompts – equivalent to role, instruction, and context. Because LLM performance depends on well-structured prompts, various frameworks formalize this process. One such solution, CrewAI, provides a structured approach to defining role-playing, which we’ll discuss next.
Multiple agents vs. single agent
Let’s take the example of Search Augmented Generation (RAG) using a single agent. This is an effective method to enable LLM to support domain-specific queries by leveraging information from indexed documents. However, a single agent RAG has its own limitationsequivalent to search performance or document rating. Multi-agent RAG overcomes these limitations by employing specialized agents to grasp, search, and rank documents.
In a multi-agent scenario, agents cooperate in various ways, just like distributed processing patterns: sequential, centralized, decentralized, or shared message pools. Frameworks like CrewAI, Autogen, and langGraph+langChain enable complex problem solving using a multi-agent approach. In this text, I used CrewAI as a reference platform to explore autonomous workflow management.
Workflow management: a use case for multi-agent systems
Most industrial processes involve managing workflows, whether it’s loan processing, marketing campaign management, or even DevOps. To achieve a specific goal, steps are required, either sequential or cyclical. In the traditional approach, each step (say, verifying a loan application) requires a human to perform the tedious and mundane task of manually processing each application and verifying it before moving on to the next step.
Each step requires input from a subject material expert. In a multi-agent setup using CrewAI, each step is performed by a crew of multiple agents. For example, when verifying a loan application, one agent may confirm a user’s identity by checking background documents equivalent to a driver’s license, while one other agent verifies the user’s financial information.
This begs the query: Can a single crew (with multiple agents arranged in a sequence or hierarchy) handle all steps of loan processing? Where possible, this complicates the crew by requiring extensive temporary memory and increasing the risk of deviation from goal and hallucinations. A more efficient approach is to treat each stage of loan processing as a separate crew, viewing the entire workflow as a graph of crew nodes (using tools like langGraph) running sequentially or cyclically.
Since LLMs are still in their early stages of intelligence, full workflow management can’t be completely autonomous. A human in the loop is needed at key stages of end-user verification. For example, after the crew completes the verification step of a loan application, human supervision is needed to review the results. Over time, as trust in AI increases, some steps may grow to be fully autonomous. Today, AI-based workflow management plays a supporting role, streamlining tedious tasks and reducing overall processing time.
Production challenges
Bringing multi-agent solutions into production can pose several challenges.
- Scale: As the variety of agents increases, collaboration and management grow to be difficult. Various frameworks offer scalable solutions – for example Llamaindex uses an event-driven workflow for managing multiple agents at scale.
- Latency: Agent performance often suffers from latency as tasks are performed iteratively, requiring multiple LLM calls. Managed LLM (equivalent to GPT-4o) is slow on account of hidden guardrails and network latency. Self-hosted LLM (with GPU control) comes in handy for solving latency issues.
- Performance and hallucination issues: Due to the probabilistic nature of LLM, agent performance may vary depending on execution. Techniques equivalent to templating the output (equivalent to JSON) and providing loads of examples in the prompts can help reduce response variability. The problem of hallucinations can be further reduced by training agents.
Final thoughts
How – emphasizes Andrew Ngagents are the way forward for artificial intelligence and will proceed to evolve with the LLM. Multi-agent systems will make progress in processing multimodal data (text, images, video, audio) and performing increasingly complex tasks. While AGI and fully autonomous systems are still on the horizon, multi-agents will fill the current gap between LLM and AGI.
People who determine about data
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including data scientists, can share data-related insights and innovations.
If you must read about modern ideas and current information, best practices and the future of knowledge and data technologies, join us at DataDecisionMakers.
You might even consider writing your individual article!