Getting Started with AI Agents (Part 1): Capturing Processes, Roles, and Connections

Getting Started with AI Agents (Part 1): Capturing Processes, Roles, and Connections


At a minimum, a modern AI agent consists of a large language model (LLM) that enables for invocation of some tools. Given the right set of coding tools, he would start by generating code, find a way to run it in a container, observe the results, modify the code, and thus have a higher likelihood of making usable code.

In turn, a generative artificial intelligence model requires certain input data and, in the means of predicting expectations, generates an output. For example, we give it a coding task, it generates some code and, depending on the complexity of the task, the code could also be usable as is.

- Advertisement -

Since they perform different tasks, agents should find a way to confer with each other. For example, imagine your organization’s intranet with a useful search box directing you to the applications and resources you wish. If your organization is large enough, apps belonging to different departments have their very own search fields. Creating agents, perhaps using techniques like augmented search generation (RAG), to expand search fields makes a lot of sense. There is no point in forcing the user to repeat a query when the search box finds it useful based on the initial query. We would relatively have the best agent collaborate with other agents representing different applications and present you, the user, with a consolidated and unified chat interface.

A multi-agent system representing software or various workflows in an organization can have several interesting benefits, including increased productivity and reliability, operational resiliency, and the ability to update various modules more quickly. We hope this text helps you see easy methods to achieve this.

But first, how should we go about creating multi-agent systems?

Capturing organization and roles

First, we must always capture the processes, roles, responsible nodes and connections of the various actors in the organization. By actors I mean people and/or applications that act as knowledge employees in an organization.

An org chart is perhaps a good place to start out, but I suggest starting with workflows because the same people in an organization normally perform different processes and people depending on workflows.

There are tools available that use AI to discover workflows, but you may as well build your individual AI gen model. I built one as GPT which takes a description of the domain or company name and creates an agent network definition. Since I exploit the multi-agent platform built into my company, GPT creates the network as a Hocon file, but from the generated files it needs to be clear what the roles and responsibilities of each agent are and what other agents it is connected to.

Please note that we wish to make sure that the agent network is a directed acyclic graph (DAG). This implies that no agent might be each downstream and upstream of one other agent, directly or not directly. This greatly reduces the risk of agent network queries spiraling into a spiral.

In the examples presented here, all agents rely on LLM. If a node in a multi-agent organization can have zero autonomy, then that agent, combined with its human counterpart, should have the whole lot controlled by a human. We will need all processing nodes, whether applications, humans or existing agents, represented as agents.

Recently, there have been many advertisements from firms offering specialized agents. We would, in fact, prefer to reap the benefits of such measures if they are available. We can take an existing agent and wrap its API into one of our agents so that we will reap the benefits of our inter-agent communication protocols. This implies that such external agents will have to supply us with their APIs that we will use.

How to define agents

Various agent architectures have been proposed in the past. For example, a board architecture requires a centralized point of communication where various agents declare their roles and capabilities, and the board calls them depending on the way it plans to meet the request (see OAA).

I prefer a more distributed architecture that respects the encapsulation of responsibilities. Each agent, upon receiving a request, decides whether it could process it or not and what it must do to process the request, and then returns its list of necessities to the agent sending the request up the chain. If the agent is down chain, he asks the agent if he will help fulfill all or a part of the request. If it receives any requirements from the contacted lower links in the chain, it checks with other agents to see if they will meet them; if not, it sends them up the chain so they will query a human user. This architecture is called AAOSA architecture, and – interesting fact – it was the architecture used in early versions of Siri.

Here is an example system prompt that might be used to convert an agent to an AAOSA agent.

After receiving your inquiry you’ll:

  1. Call your tools to find out which agents downstream in your tools are responsible for all or a part of the problem
  2. Ask lower-level agents what they should handle their a part of the investigation.
  3. Once you have collected the requirements, you’ll forward the query and the fulfilled requirements to the appropriate downstream agents.
  4. Once all agents downstream have responded, you’ll compile their responses and return the final response.
  5. You, in turn, could also be called by other agents in the system and will have to act as an intermediary with them.

In addition to the set of roles and responsibilities defined in natural language on the command line of each agent, agents may or may not contain tools that they will invoke by passing various arguments to the tools. For example, a product manager agent may have to find a way to process various tickets on a virtual Kanban board, or an alert agent may have to invoke a tool to generate alerts in the alert system.

Current multi-agent systems, equivalent to Microsoft AutoGen, have extensive and often hard-coded agent coordination mechanisms and architectures. I prefer a more robust setup where agents treat their downstream agents as tools, with loosely defined arguments that might be typed, and semantics determined by the agents as needed.

In this configuration, the downstream agent might be defined as a function call:

AI agents in practice

Let’s look at the HR agent scenario. A user may inform the primary search box agent of a life-changing event, equivalent to “my significant other has passed away,” which can trigger legal and HR agents who have a role to play in serving the user. This is because they, in turn, received information from some lower-level agents, namely legal advice and compensation and advantages agents. The resulting consolidated response that the user receives from the root search box agent will appear to be this:

    • A advantages agent can assist you understand how leaving may affect your advantages, medical insurance, and retirement plans. You will need to supply information about the sorts of plans affected and whether your companion was a dependent.
    • A payroll agent will help manage tasks equivalent to updating tax deductions or processing bereavement leave pay. This requires information about the company’s bereavement leave policy and current payroll configuration.
    • Legal representatives may provide general guidance as to how this resolution should apply to the affairs of the company.

Each department needs specific information to find a way to supply tailored support.

Meanwhile, in an IT scenario, the single point of contact is a customer support representative who can handle complex, multi-faceted requests equivalent to: “Our network is slow. I think it’s overloaded because we recently added staff to our office. I will also need a quote for the modernization.”

The sample answer describes the steps:

Network Performance Assessment: A specialist from your network operations center can assist you troubleshoot slowness and potential congestion. They need access to network performance data, details about the current network infrastructure, information about the variety of staff added and network usage patterns.

Upgraded quote: A sales engineer and field technician can assist you quote for an upgrade. To do this, they need:

  • Detailed information about your current network configuration, including hardware and bandwidth requirements.
  • Desirable performance improvement.
  • Budget constraints and any special preferences or requirements.
  • The scale of the update and any specific performance goals.

I hope this has given you a good idea of ​​what is required to establish a multi-agent network. In the second part, I’ll discuss the importance of implementing security when building multi-agent systems and outline easy methods to build controls that enable human intervention and uncertainty checking. I’ll also detail the required steps to create a security agent to oversee a network of agents, and delve into the challenges of making multi-agent networks – equivalent to congestion and congestion – and easy methods to mitigate them with timeouts, task sharing, and redundancy.

.

People who determine about data

Welcome to the VentureBeat community!

DataDecisionMakers is a place where experts, including data scientists, can share data-related insights and innovations.

If you should read about progressive ideas and current information, best practices and the future of information and data technologies, join us at DataDecisionMakers.

You might even consider writing your individual article!

Latest Posts

Advertisement

More from this stream

Recomended