If 2023 was the yr of generative chatbots and AI-powered engines like google, 2024 was the yr of AI agents. What began with Devin earlier this yr has turned into a full-blown phenomenon, offering businesses and individuals a technique to transform the way they work at various levels, from programming and development to non-public tasks comparable to planning and booking vacation tickets.
Among these wide-ranging applications, we have seen an increase this yr in the number of information agents – AI-powered agents that handle various sorts of tasks across the data infrastructure stack. Some performed basic data integration work, while others handled further tasks comparable to evaluation and management in the pipeline, making the whole lot simpler and easier for enterprise users.
The advantages included greater efficiency and cost savings, leading many to wonder: How will things change for data teams in the coming years?
Gen AI agents took over data-related tasks
While agent-based capabilities have been around for some time and allow enterprises to automate certain basic tasks, the rise of generative AI has taken the whole lot to the next level.
With natural language processing capabilities and the use of gen AI tools, agents can go beyond easy reasoning and responding and actually plan multi-step actions, independently interacting with digital systems to implement actions, while collaborating with other agents and people. They also learn to enhance their performance over time.
Cognition AI’s Devin was the first major agent offering to enable large-scale engineering operations. Subsequently, larger players began to supply more targeted corporate and personal agents based on their models.
In an interview with VentureBeat earlier this yr, Google Cloud’s Gerrit Kazmaier said he heard from customers that their data scientists continually face challenges comparable to automating the manual work of information teams, reducing the cycle time of information and analytics pipelines, and simplifying data management. Essentially, teams didn’t lack ideas for how they might create value from their data, but they lacked time to implement those ideas.
To solve this problem, Kazmaier explained, Google upgraded BigQuery, its core data infrastructure offering, with Gemini AI. The resulting agentic capabilities not only provide enterprises with the ability to find, cleanse and prepare data for further use – breaking down data silos and ensuring quality and consistency – but also support pipeline management and evaluation, allowing teams to focus on higher-value tasks.
Many enterprises today use Gemini’s agent capabilities in BigQuery, including fintech Abovewhich leveraged Gemini’s ability to know complex data structures to automate the query generation process. Japanese IT company Unerry also uses Gemini’s SQL generation capabilities in BigQuery to assist its data teams deliver insights faster.
But discovering, preparing, and helping you analyze is just the starting. As core models have evolved, even detailed data operations — pioneered by start-ups specializing in their fields — have moved toward deeper agent-based automation.
For example, AirByte and Fastn made headlines in the data integration category. The former launched an assistant that created data connectors inside seconds based on a link to the API documentation. Meanwhile, the latter expanded its broader application development offerings with agents that generated enterprise-grade APIs – whether for reading or writing information on any topic – using only natural language description.
For its part, San Francisco-based Altimate AI targeted various data operations, including documentation, testing and transformations, with its recent DataMates technology, which used agent-based AI to retrieve context from across the data stack. Other startups, including Redbird and RapidCanvas, have moved in the same direction, claiming to supply AI agents that may handle as much as 90% of the data tasks required in AI and analytics pipelines.
Agents serving RAG and more
In addition to extensive data operations, the agent’s capabilities were also explored in areas comparable to search-assisted generation (RAG) and downstream workflow automation. For example, the team responsible for the vector database Weave we recently discussed the concept of agentic RAG, a process that permits AI agents to access a wide selection of tools – comparable to a web search engine, calculator, or API software (comparable to Slack/Gmail/CRM) – to drag in and validate data from multiple sources to enhance response accuracy.
Moreover, Snowflake Intelligence was released late in the yr, giving enterprises the ability to establish data agents that would leverage not only business intelligence data stored in a Snowflake instance, but also structured and unstructured data from isolated third-party tools – e.g. database, documents in knowledge bases comparable to SharePoint, and information in productivity tools comparable to Slack, Salesforce, and Google Workspace.
With this extra context, agents reveal relevant insights in response to natural language questions and take specific actions based on the insights generated. For example, a user can ask their data agent to enter the obtained information into an editable form and upload the file to Google Drive. They may even be asked to save lots of to Snowflake tables and make modifications to the data as needed.
There’s still a lot ahead of us
While we may not have covered every application of information agents seen or announced this yr, one thing is quite clear: the technology is here to remain. As generational AI models evolve, the deployment of AI agents will speed up, and most organizations, no matter sector and size, will decide to delegate repetitive tasks to specialized agents. This will directly translate into efficiency.
A recent survey of 1,100 technology managers proves this point Capgemini82% of respondents said they intend to integrate AI-based agents into their stack in the next 3 years – up from 10% today. More importantly, as many as 70-75% of respondents said they’d trust an AI agent to research and synthesize data on their behalf, in addition to handle tasks comparable to generating and iteratively improving code.
This agent-driven change would also mean significant changes in how data teams function. Currently, agent performance is not at production level, which implies that at some point a human must take over to adapt the work to their needs. However, with a few additional advances in the coming years, this gap will likely close, providing teams of AI agents that will be faster, more accurate, and less liable to errors typically made by humans.
In summary, the data scientist and analyst roles we see today will likely change, with users likely moving into the domain of AI governance (where they might keep an eye on AI activities) or higher-value tasks that the system may have difficulty performing .