Back to stories

Beyond Bots: How AI Agents Are Driving the Next Wave of Enterprise Automation

September 26, 2024

Every job in the economy can be thought of as a bundle of tasks, shared between humans and machines. Over the years, software has picked off more and more of these tasks, but even today, humans still own the vast majority of business processes. In every function, headcount costs dwarf software expenditures by orders of magnitude.

AI agents promise to shift this balance of work decisively. Unlike previous generations of software that primarily addressed low-level, sequential tasks that could be robotically executed, new cognitive architectures enable agents to dynamically automate end-to-end processes. This is not just AI that can read and write—but ones that can decide the flow of your application logic and take actions on your behalf.

And they represent the biggest opportunity for LLMs in the enterprise today. In another article, we wrote about what these new “agents” are and the design patterns that made them possible. Here, we’ll explore how they’re being applied in the enterprise to bring about a new era for enterprise automation.

RPA Redux?

If it feels like we’ve heard this story before, it’s because, for the past decade, companies like UiPath and Zapier were selling rhyming visions under the banner of “bot automation.” 

UiPath was first. At the core of the robotic process automation (RPA) giant’s business is screen scraping and GUI automation to enable “bots” to record what users are doing and then mimic the sequential steps to automate processes such as extracting information from documents, moving folders, filling in forms, and updating databases.

Later, iPaaS providers like Zapier emerged with a more lightweight “API automation” approach for productivity use cases. The platform used pre-built API integrations and webhooks to provide more stable automations, though the approach limited the company’s scope to web app automations vs. UiPath’s ability to automate processes across different software, including those that might not support APIs.

UiPath and Zapier proved the market for composable, rules-based horizontal automation platforms to address the long tail of enterprise processes that exist in and between department- or industry-specific software systems. But as enterprises scaled their bot-based automations, the gap between the capabilities of these traditional architectures and their promised autonomy started to show, especially around:

  • (Still) a lot of humans and manual work. For all the talk about bots and automation, the process of standing up and maintaining automations is still painfully manual. In fact, for every dollar that UiPath makes, $7 is going to implementation and consulting partners like EY that result in lengthy, expensive deployment and maintenance cycles.
  • Brittle UI automations or limited API integrations. UI automations break often when software UI is changed, while APIs are more stable but offer far fewer integrations, especially with legacy or on-prem software.
  • Inability to process unstructured data. Unstructured and semi-structured data comprises 80% of enterprise data, but sequence-based automations are unable to intelligently work with almost any of this data. Intelligent document processing (IDP) solutions like Hyperscience and Ocrolus attempted to make progress here, but struggled with edge cases and exception handling even for simple “extract and transform” document processing use cases.

Moreover, legacy RPA and iPaaS solutions continue to be handcuffed to their deterministic architectures—even when they try to incorporate LLMs. Today, both UiPath’s AI solution Autopilot and Zapier’s AI Actions only offer LLMs for sub-agentic design patterns like (1) text-to-action, or (2) nodes for semantic search, synthesis, or one-shot generation.

These AI features can certainly be powerful. They enable business function rather than IT ownership of automation rulebooks, allow for more powerful object detection and recognition via vision transformers vs. OCR, and more offer robust data extraction and transformations via RAG. But they still miss out on the more transformative use cases for LLMs in process automation, which we’ll explore next.

AI Agents as Decision Engines

Agents are fundamentally different. They sit as decision engines at the center of the control flow for an application, in contrast to the hard-coded logic of today’s RPA bots, or even the RAG apps that defined the first wave of the generative AI revolution. For the first time, they are enabling adaptability, multi-step actions, complex reasoning, and robust exception handling.

Let’s illustrate the implications in an invoice reconciliation example. Below is a simplified process diagram for matching a new invoice PDF to a company’s general ledger (similar to ones implementation engineers must visually model out for RPA):

Example: RPA for Invoice Reconciliation Workflow

Clearly, the complexity of the workflow quickly grows unwieldy and it becomes nearly impossible to account for all the relevant edge cases and exceptions, even within the first three decision sets. More often than not, a RPA bot tasked with robotically executing this workflow will error out and escalate a partially matched or missing line item to a human—perhaps explaining why most enterprises today still staff hundreds of employees to the task monthly rather than automating the highly manual process.

When applied to the same workflow, however, agents are far more performant—with an ability to:

  • Adapt to new situations. Agents can intelligently recognize and adapt to new data sources, invoice formats, naming conventions, account numbers, and even policy changes based on basic reasoning and relevant business context, all without reprogramming or a SOP explicitly specifying so.
  • Enable multi-step actions. In cases of mismatched invoice amounts, an agent can execute multi-step investigations, including scanning recent emails from the vendor for notice of a possible price change. 
  • Demonstrate complex reasoning. Let’s say a company needs to reconcile an invoice from an international supplier against its ledger. This process involves multiple considerations, including invoice currency, ledger currency, transaction date, exchange rate fluctuations, cross-border fees, and bank fees, all of which must be retrieved and calculated together to reconcile payments. Agents are capable of this type of intelligence, whereas a RPA agent might just escalate the case to a human.
  • Account for uncertainty. Agents are robust to exceptions like rounding errors or unreadable numbers for individual line items based on context clues like matching total order values and historical invoicing timing and frequency.

The AI Agent Market Landscape

Agents aren’t just science fiction, either. Although the category is still emerging, enterprises from startups to Fortune 500 companies are already buying and leveraging these systems at scale.

The current agent landscape can be visualized using two key dimensions:

  • Domain specificity: This ranges from highly specialized agents for verticals like healthcare or departments like customer support to horizontal agent platforms with broad, general capabilities.
  • LLM autonomy: This indicates the extent to which the language model can independently plan and direct application logic.

These two factors form the axes of our working AI agent market map, which is below.

At the top right of the market map, the most horizontal and generalizable agents include:

  • Enterprise agents. Extensible agent platforms enable enterprises to build and manage agents across multiple functions and workflows via natural language SOPs or rulebooks like those you’d give to new employees. These platforms appeal especially to centralized IT buyers seeking broadly applicable agent capabilities rather than separate point solutions for each business unit. The core processing capability of Sema4’s invoice reconciliation agent, for instance, can be applied to various data validation tasks across finance, procurement, and operations.

That being said, most enterprise agents use an “agents on rails” architecture, which requires agents to be grounded in a workflow-specific set of predefined actions, business context, and guardrails for each new process. And although some of this data infrastructure can be shared across workflows, these platforms’ horizontal nature comes more from stacking use cases rather than human-like generalizability. Consequently, some players in this space have already started to gravitate toward specific domains for greater product and GTM leverage (e.g., Brevian to customer support and security, Ema to sales and support).

  • Browser agents. Web agents like MultiOn, Induced, and Twin represent another type of horizontal, generalizable agent. Most follow a “general AI agent” design, leveraging vision transformers trained on diverse software interfaces and their underlying codebases. This allows the agents to “understand” web components, their functionality, and interactions, in order to automate web browsing, visual UI actions, and text entry.

While these agents gain in generalizability, however, they often sacrifice in consistency. Currently, most target simpler productivity or e-commerce use cases as they work towards enterprise-grade performance. Without the benefit of a more constrained problem space with appropriate data scaffolding and guardrails, more dependable browser agents must overcome key challenges including managing complex action and observation spaces, maintaining context across multiple pages, and interpreting diverse web interfaces.

  • AI-enabled services. Enterprise demand for agentic capabilities currently outstrips customers’ abilities to productionize their own agents—especially as extensive data infrastructure and guardrails are needed to make “agent on rails” designs work in practice. This is where companies like Distyl and Agnetic come in, offering forward-deployed engineering services in a “Palantir for AI” model to close the gap. Like with Palantir’s Foundry, these companies can then reuse modular systems infrastructure across customers to rebalance the platform-to-services ratio over time.

But not all agents aim to be both horizontal and generalizable. Increasingly, we’re seeing domain- and workflow-specific agents emerge that can increase reliability by constraining the types of problem they’re trying to solve:

  • Vertical agents. The most promising opportunities for vertical agents exist in manual, procedure-driven processes currently handled by humans following SOPs or rulebooks. Many enterprises already outsource these functions to Business Process Outsourcing (BPO) firms or contractors. These tasks are often too complex for rules-based automation, yet not challenging or differentiating enough to justify in-house knowledge workers. Top categories include customer support; recruiting; certain software development tasks like code review, testing, and maintenance; cold sales outbounding; and security operations.
  • AI assistants. The other way to narrow agent focus is via task specificity as opposed to domain specificity. AI assistants execute simpler, more productivity-oriented tasks versus the more complex end-to-end processes that enterprise and vertical agents take on. Common primitives include few-step web research, knowledge extraction, summarization, and unstructured data transformations for ad hoc tasks like chat-your-PDF or extracting feature requests from Gong transcripts.

Finally, it’s worth noting that there are broad categories of generative AI solutions that, while not agents themselves, compete for the same budgets and, at times, even the same workflows as agent-based solutions. Primarily built around RAG architectures, these solutions don’t sit in application control flows and so cannot more fully replicate the human-like reasoning of agents. However, their capabilities still enable significant services automation while offering enterprises control:

  • Vertical AI. Semantic search and unstructured data transformation are powerful primitives in vertical workflows. The healthcare AI automation platform Tennr, for instance, extracts unstructured data from faxes, PDFs, phone calls, and other messy sources and inputs them into clinics’ EHRs to unblock referral processing and eliminate the need for staff to input data by hand. Industrial AI, as another example, leverages a similar approach to automate quoting workflows for manufacturers.
  • RAG-as-a-Service. RAG-as-a-Service companies like Danswer and Gradient are the horizontal equivalents to vertical semantic search and unstructured data transformation companies, offering customers the ability to query unstructured data sources like PDFs, extract data, and input results into a more structured database or system of record.
  • Enterprise search. Glean, Perplexity, and Sana* offer semantic querying for another purpose—indexing and retrieving conceptually relevant documents in order to better manage organization-wide knowledge and break down enterprise data silos.

The Future of Enterprise Automation

Generative AI’s second wave will be defined by agents that can think and act on your behalf, rather than just read and write. As these architectures mature, they will be powerful catalysts for AI’s takeover of the services economy. At Menlo, we’re excited to meet teams that are building this future. If you’re building in the agents space, we’d love to connect.

JP Sanday (jp@menlovc.com
Steve Sloane (steve@menlovc.com)
Naomi Ionita (naomi@menlovc.com)  
Derek Xiao (derek@menlovc.com)


*Backed by Menlo Ventures