EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryRetrieval-Augmented GenerationEAAPL-RAG007
EAAPL-RAG007Proven
⇄ Compare

Agentic Retrieval-Augmented Generation

[EAAPL-RAG007] Agentic Retrieval-Augmented Generation

Category: Artificial Intelligence / Retrieval-Augmented Generation Sub-category: Agentic and Multi-Hop RAG Version: 1.2 Maturity: Proven Tags: rag agentic multi-hop iterative-retrieval query-planning self-critique tool-use reasoning Regulatory Relevance: EU AI Act Article 14 (Human oversight for autonomous systems), ISO/IEC 42001 Section 8.5 (AI system operation), NIST AI RMF (Govern 1.7 — autonomous decision-making)


1. Executive Summary

Agentic RAG places a reasoning AI agent in the orchestration loop of the retrieval-generation pipeline, enabling it to plan multi-step retrieval strategies, evaluate the quality of retrieved evidence, execute iterative searches to fill gaps, and self-critique its nascent answers before returning a final response. Unlike standard RAG, which executes a single fixed retrieval cycle per query, Agentic RAG can break a complex question into sub-questions, retrieve evidence for each, synthesise intermediate answers, identify what it still doesn't know, and retrieve again — autonomously, until it has sufficient grounded evidence to answer with confidence.

For enterprise leaders, Agentic RAG addresses a category of knowledge queries that single-cycle RAG fundamentally cannot handle: multi-hop questions that require chaining across multiple documents ("Which of our suppliers are located in regions affected by the sanctions announced in the regulatory alert published this week, and what are our contractual obligations for those suppliers?"), questions requiring synthesis across multiple evidence sources, and research-style queries that require iterative exploration of a knowledge domain. The business value is the automation of knowledge work that previously required a skilled analyst to retrieve, read, synthesise, and reason across multiple documents — turning a 2-hour research task into a 30-second AI-assisted workflow.


2. Problem Statement

Business Problem

Single-cycle RAG systems retrieve a fixed set of candidate documents and generate an answer from them in one pass. This architecture is adequate for factual lookup questions but insufficient for analytical questions that require multi-document synthesis, causal reasoning, or iterative hypothesis refinement. Analysts who use RAG systems for complex research tasks frequently report that the system misses relevant documents that would only be found if the initial answer had been used to formulate a better search query.

Technical Problem

Standard RAG has no mechanism for the system to recognise that its retrieved context is insufficient to answer the question, no ability to plan a multi-step retrieval strategy before retrieving, and no capacity to iteratively refine its retrieval based on partial answers. The single retrieval cycle architecture is fundamentally limited for complex, multi-hop knowledge questions.

Symptoms

  • RAG system answers complex analytical questions with shallow, incomplete responses that reference only one or two source documents
  • Users manually perform "follow-up searches" after receiving an initial RAG answer, indicating the system did not exhaust the relevant knowledge
  • High analyst override rate: AI-generated answers are frequently edited with additional context that the analyst had to find separately
  • The system answers "I don't have enough information" for questions that are answerable from the corpus but require multi-hop reasoning

Cost of Inaction

  • Complex knowledge work remains manual and unautomated, despite a capable knowledge corpus
  • Analysts treat the RAG system as a first-pass tool only, not as a research assistant capable of deep synthesis
  • Competitive disadvantage versus AI deployments that automate complex analytical workflows end-to-end

3. Context

When to Apply

  • Complex multi-hop questions requiring evidence from multiple, sequentially discovered documents
  • Research synthesis tasks: competitive analysis, regulatory impact assessment, risk analysis across multiple source documents
  • Question answering where the answer to one sub-question determines which documents to retrieve next (dependent retrieval chains)
  • Use cases where the appropriate retrieval strategy varies per question (some questions need one retrieval; others need five)
  • Scenarios where the agent should be able to answer "I cannot find sufficient evidence" after exhausting retrieval strategies, rather than hallucinating

When NOT to Apply

  • Simple factual lookups answerable in a single retrieval cycle (adds unnecessary latency and cost)
  • Latency-critical applications (agentic loops add 1–10 seconds of reasoning time per iteration)
  • Autonomous action-taking scenarios where the agent's retrieval findings would directly trigger writes or external API calls without human approval — this requires explicit Human-in-the-Loop governance (EAAPL-RAG007 combined with HITL patterns)
  • Corpora that are too small to benefit from multi-hop retrieval (< 10,000 documents)

Prerequisites

  • An underlying RAG retrieval capability (EAAPL-RAG001 or EAAPL-RAG005)
  • An LLM with reliable tool-calling / function-calling capability (GPT-4o, Claude 3.5, Gemini 1.5 Pro)
  • A defined set of retrieval tools the agent can invoke (search, filter, summarise, compare)
  • A maximum iteration limit to prevent infinite loops
  • Observability instrumentation that traces each agent decision, tool call, and its output

Industry Applicability

Industry Complex Query Example Multi-Hop Depth
Financial Services "Which loan products in our portfolio have covenant conditions that may be triggered by the RBA rate change announced today?" 3–4 hops
Legal "Summarise all cases in our case database where a precedent from Smith v Jones was applied in an employment context" 3–5 hops
Procurement "Identify all active suppliers in our approved supplier list that are headquartered in sanctioned jurisdictions per this week's OFAC update" 2–3 hops
Healthcare "Are there any patients in our clinical notes who have been prescribed Drug X and also have contraindicated Condition Y per this month's formulary update?" 2–4 hops
Strategy "Summarise all internal research reports that reference our top 5 competitors and were published in the last 12 months" 2–3 hops

4. Architecture Overview

Agentic RAG wraps the retrieval-generation pipeline in a Reason-Act-Observe loop (a specialisation of the ReAct framework). The agent is an LLM with access to a defined set of retrieval tools and a scratchpad for recording its reasoning steps.

Query Planning

When a complex question is received, the agent's first step is query planning: it reasons about what information is needed to answer the question and decomposes the original query into a set of sub-queries with dependencies. For example, "Which of our suppliers are in sanctioned regions per this week's alert?" decomposes into: (1) retrieve this week's sanctions alert, (2) extract the list of sanctioned regions, (3) retrieve the supplier list, (4) cross-reference supplier regions against the sanctioned list. The query plan is recorded in the agent scratchpad and drives the retrieval sequence.

Query planning can be explicit (the agent writes a multi-step plan before executing any retrieval) or implicit (the agent uses tool calling in sequence, with each tool output informing the next call). Explicit planning produces more predictable and auditable behaviour; implicit sequential tool calling is more flexible but harder to trace.

Retrieval Tools

The agent is given a defined set of retrieval tools as callable functions:

  • search(query: str, filters: dict) → List[Chunk]: standard RAG retrieval
  • get_document(doc_id: str) → Document: retrieve a full document by ID (for follow-up reading)
  • summarise_results(chunks: List[Chunk]) → str: summarise a set of retrieved chunks
  • compare_documents(doc_ids: List[str], aspect: str) → str: compare specific aspects across multiple documents
  • extract_entities(chunks: List[Chunk], entity_type: str) → List[str]: extract named entities for use as next-query parameters

Tools are strictly typed and validated — the agent cannot invoke arbitrary code, only the defined tool set. Tool definitions include descriptions that guide the agent's tool selection.

Iterative Retrieval and Self-Critique

After each retrieval cycle, the agent evaluates its current evidence against the original question using a self-critique prompt: "Given the question and the evidence retrieved so far, what information is still missing? What additional searches would improve the answer?" If the self-critique identifies gaps, the agent formulates and executes additional retrieval calls. The loop continues until one of three stopping conditions is met: (1) the agent's self-critique determines the evidence is sufficient, (2) the maximum iteration limit is reached, or (3) additional retrieval is producing diminishing returns (identical or near-identical results to previous retrieval steps).

Grounded Final Synthesis

When the retrieval loop concludes, the agent synthesises a final answer from the accumulated evidence. The synthesis step is more complex than in standard RAG: the agent must integrate evidence from multiple retrieval rounds, handle potentially conflicting evidence, attribute each claim to its source, and structure the answer appropriately (summary, list, table, or prose depending on the question type). The synthesis prompt explicitly instructs the agent to cite each claim and to acknowledge when evidence is incomplete or conflicting.

Human Oversight Gate

For high-stakes agentic tasks (those whose answers will directly inform consequential decisions), an optional human oversight gate is inserted before final response delivery. The gate presents the agent's reasoning trace, the evidence sources used, and the draft answer to a human reviewer, who can approve, edit, or reject the answer. This gate is a regulatory requirement for high-risk AI use cases under EU AI Act Article 14.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Ingress["Query Ingress"] A[User Query] B[Query Planner] end subgraph AgentLoop["ReAct Agent Loop"] C{Reason + Act} D[Retrieval Tools] E[Self-Critique] end subgraph Backend["RAG Infrastructure"] F[Vector + BM25 Index] G[Document Store] end subgraph Output["Synthesis + Delivery"] H[Grounded Synthesis] I[Audit Log] end A --> B --> C C -->|select tool| D D --> F D --> G D -->|observations| E E -->|insufficient| C E -->|sufficient| H H --> A H --> I style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f3e8ff,stroke:#a855f7 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308 style H fill:#d1fae5,stroke:#10b981 style I fill:#fef9c3,stroke:#eab308

6. Components

Component Type Responsibility Technology Options Criticality
Query Planner NLP / LLM Decompose complex query into sub-queries with dependencies LLM function calling; LangChain Plan-and-Execute; LlamaIndex ReAct High
ReAct Agent Orchestrator Orchestration Drive the Reason-Act-Observe loop; manage scratchpad; enforce iteration limits LangChain AgentExecutor; LlamaIndex ReActAgent; AutoGen; custom Critical
Retrieval Tool: search Retrieval Execute RAG retrieval (delegates to EAAPL-RAG001/005) LangChain retriever tool; custom tool wrapper Critical
Retrieval Tool: get_document Retrieval Fetch full document for deep reading Document store SDK as tool High
Retrieval Tool: summarise LLM Summarise retrieved chunks into concise evidence LLM call within tool Medium
Retrieval Tool: compare LLM Compare specific aspects across multiple documents LLM call within tool Medium
Self-Critique Module LLM Evaluate current evidence sufficiency; identify gaps Structured LLM prompt with stopping criteria High
Scratchpad / Working Memory Storage Record agent reasoning, tool calls, and observations per session In-memory dict (short sessions); Redis (long sessions) High
Human Oversight Gate Workflow Present agent reasoning to human reviewer for high-stakes decisions Custom UI; Slack workflow; ServiceNow integration High (regulated use)
Iteration Limiter Safety Enforce maximum loop iterations; prevent infinite loops Hard counter in orchestrator; token budget guard Critical
Agentic Audit Logger Compliance Record full reasoning trace, all tool calls, and all sources used Langfuse, Arize AI, custom structured logger Critical

7. Data Flow

Primary Flow

Step Actor Action Output
1 User Submit complex multi-hop query Query string
2 Query Planner Decompose query into sub-queries; record in scratchpad Sub-query list + dependency graph
3 Agent (Reason) Select next action based on scratchpad state Tool call specification
4 Tool Executor Execute selected tool (search / get_document / summarise) Tool output (chunks / document / summary)
5 Agent (Observe) Record tool output in scratchpad Updated scratchpad
6 Self-Critique Evaluate: "Is the evidence sufficient? What is still missing?" Continue signal OR Stop signal
7 Loop (if Continue) Return to step 3 with updated scratchpad context New tool call based on gaps identified
8 Synthesis (if Stop) Integrate all scratchpad evidence into final answer with citations Draft answer + full citation list
9 Human Oversight Gate (if required) Present reasoning trace and draft to human reviewer Approved / Edited / Rejected decision
10 Audit Logger Record complete reasoning trace, tool call sequence, all source IDs Immutable audit record
11 Response Delivery Return final answer with citations and (optionally) reasoning trace Final response

Error Flow

Error Condition Detection Recovery
Maximum iteration limit reached without sufficient evidence Iteration counter Return best-available answer with "Incomplete evidence" flag; log for quality review
Tool call returns no results (empty retrieval) Tool output validation Agent reasons about the empty result; may reformulate query or acknowledge knowledge gap
Agent enters circular retrieval loop (same query repeated) Query deduplication in scratchpad Detect repeated tool calls; break loop; proceed to synthesis with available evidence
LLM tool-calling error (malformed function call) JSON schema validation Retry with re-prompted clarification; max 3 retries; escalate to error state

8. Security Considerations

Tool Boundary Enforcement

The most critical security control in Agentic RAG is ensuring the agent cannot invoke tools outside its defined tool set. All retrieval tools are read-only — the agent must have no access to write, update, or delete operations. Tool definitions must be validated against a schema; any tool invocation not matching a defined schema must be rejected. The agent runtime must run in a sandboxed environment with no outbound network access beyond the defined tool API endpoints.

Prompt Injection in Multi-Hop Context

Multi-hop retrieval creates a compounded prompt injection risk: an adversarial document retrieved in hop 1 could inject instructions into the agent's scratchpad that influence hop 2 onwards. The agent system prompt must explicitly instruct the model that retrieved content is data, not instructions, and the orchestrator must sanitise tool outputs before inserting them into the agent context window.

OWASP LLM Top 10 Mitigations

OWASP LLM Risk Agentic-Specific Concern Mitigation
LLM01: Prompt Injection Adversarial content retrieved in hop N influences agent reasoning for hop N+1 Tool output sanitisation; treat all tool outputs as untrusted data in system prompt
LLM07: Insecure Plugin Design Agentic tools are equivalent to plugins; must be strictly typed and scoped Read-only tools only; strict JSON schema for tool calls; no shell or code execution tools
LLM08: Excessive Agency Agent autonomously executes many retrieval steps; scope creep risk Iteration limit; read-only tools; human oversight gate for consequential outputs
LLM04: Model Denial of Service Runaway agent loops with expensive LLM calls Hard iteration limit; token budget guard; per-user session cost limit

9. Governance Considerations

Autonomous Decision Boundary

Agentic RAG must have a clearly defined boundary between autonomous retrieval (permitted) and autonomous action-taking (not permitted in this pattern). The agent may search, read, summarise, and synthesise — but the final answer must be delivered to a human, not used to trigger downstream automated actions without explicit human approval.

Reasoning Trace as Governance Artefact

Every agentic session must produce a complete, immutable reasoning trace: the initial query, each reasoning step, each tool call with its parameters, each tool output, and the final synthesis. This trace is the primary governance artefact for post-hoc review of agentic decisions. Regulators reviewing an AI-assisted compliance analysis must be able to trace every claim back to the retrieved document that supported it.

Governance Artefacts

Artefact Owner Frequency Purpose
Agentic Session Trace AI Operations Per session Full audit trail of all reasoning steps and tool calls
Tool Call Audit Security Weekly Review tool call patterns for anomalies or scope creep
Human Override Rate Report AI Governance Monthly Track rate at which human reviewers edit or reject agentic outputs
Iteration Distribution Report AI Operations Weekly Monitor average and P99 iteration counts; identify expensive query types

10. Operational Considerations

Monitoring

Metric Alert Threshold Notes
Average iterations per session > 8 (investigate) May indicate query types too complex for available corpus
Max iteration cap hit rate > 10% of sessions Corpus coverage or agent capability issue
Agentic session P95 latency > 30 seconds Optimise tool call latency; increase parallelism
Tool call error rate > 2% Tool API health issue
Human override rate > 30% Agent quality degradation; review self-critique prompts

Service Level Objectives

SLO Target Notes
Agentic session completion P95 ≤ 20 seconds Depends on corpus and query complexity
Iteration cap hit rate < 5% Measure of query/corpus fit
Reasoning trace completeness 100% Every session must have a complete audit trace

11. Cost Considerations

Cost Drivers

Cost Driver Notes Optimisation
LLM inference per iteration Each reasoning step is an LLM call; N iterations = N+1 LLM calls per session Use smaller model for self-critique; use premium model only for final synthesis
Tool call overhead (embedding per search) Each search tool call re-embeds a potentially modified sub-query Cache embeddings for sub-queries that are identical to previous iterations
Context window growth Scratchpad grows with each iteration; LLM input cost increases linearly Summarise and compress scratchpad after every 3 iterations
Human oversight gate Human reviewer time for high-stakes queries Reserve HITL for designated high-stakes query types only

Indicative Cost Range

Use Case Sessions/Day Average Iterations Cost per Session Monthly Cost
Research assistant (analysts) 100 4 $0.50–$2.00 $1,500–$6,000
Compliance Q&A 500 3 $0.30–$1.00 $4,500–$15,000
Complex legal research 50 6 $1.00–$4.00 $1,500–$6,000

12. Trade-Off Analysis

Agentic Orchestration Framework Comparison

Framework Flexibility Observability Production Readiness Recommendation
LangChain AgentExecutor + ReAct High Good (LangSmith integration) Proven Strong choice; large ecosystem
LlamaIndex ReActAgent High Good (built-in trace logging) Proven Strong for document-heavy use cases
AutoGen (Microsoft) Very High (multi-agent) Moderate Emerging Complex multi-agent scenarios
Custom orchestrator Maximum Custom Depends When framework limitations are binding

Query Planning Strategy

Strategy Predictability Flexibility Auditability
Explicit plan-then-execute High Low (plan may not adapt to unexpected retrieval results) Highest
Implicit sequential tool calling Medium High Medium
Hybrid (explicit plan, adaptive execution) High High Highest

Architectural Tensions

Tension Trade-off Recommendation
Iteration depth vs. latency Deep iteration: complete answer; shallow: fast Configure max iterations per query type; user-selectable "quick" vs. "thorough" modes
Self-critique verbosity vs. cost Verbose critique: better gap identification; concise: cheaper Structured JSON self-critique with fixed fields; not free-form prose

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
Circular retrieval loop (agent retrieves same content repeatedly) Medium High (cost + latency) Query deduplication in scratchpad; loop detection Detect repeated tool calls; break loop; proceed to synthesis
Hallucinated tool calls (agent invents tool that doesn't exist) Low High Tool call schema validation Reject invalid tool calls; re-prompt with valid tool list
Scratchpad context overflow (exceeds LLM context window) Medium High Token count monitoring Compress scratchpad after N iterations; summarise older evidence
Agent misattributes evidence (cites wrong source) Medium High Citation validation post-synthesis Cross-reference every cited source ID against tool call outputs in audit trace
Self-critique always returns "continue" (runaway optimism) Low High Iteration cap hit rate monitoring Hard iteration cap; self-critique must use structured stopping criteria

14. Regulatory Considerations

Regulation Requirement Agentic RAG Response
EU AI Act Article 14 Human oversight capability for high-risk AI systems Human oversight gate for agentic outputs used in consequential decisions
EU AI Act Article 13 Transparency: users must understand how AI-generated outputs were produced Reasoning trace available on request; session scratchpad as explainability artefact
ISO/IEC 42001 Section 8.5 AI system operation includes monitoring autonomous behaviours Iteration monitoring; tool call audit; human override rate tracking
NIST AI RMF Govern 1.7 Document and manage AI system autonomy levels Autonomy boundary documented: retrieval is autonomous; final answer requires human review for high-stakes use cases

15. Reference Implementations

AWS

  • Agent: Bedrock Agents (native multi-hop retrieval) or LangChain on Lambda
  • Retrieval tool: Amazon Kendra or OpenSearch k-NN
  • Scratchpad: Amazon ElastiCache (Redis) for session state
  • Audit: CloudWatch Logs with structured JSON; X-Ray for trace

Azure

  • Agent: Azure AI Studio Prompt Flow with agent orchestration, or LangChain on Azure Functions
  • Retrieval tool: Azure AI Search
  • Scratchpad: Azure Cache for Redis
  • Audit: Azure Monitor + Application Insights

GCP

  • Agent: Vertex AI Agent Builder (Grounding with Search) or LlamaIndex on Cloud Run
  • Retrieval tool: Vertex AI Vector Search
  • Scratchpad: Cloud Memorystore (Redis)
  • Audit: Cloud Trace + Cloud Logging

Pattern ID Pattern Name Relationship
EAAPL-RAG001 Enterprise RAG Agentic RAG wraps and repeatedly invokes the RAG001 retrieval layer
EAAPL-RAG005 Hybrid RAG Recommended retrieval strategy for each agent search tool call
EAAPL-RAG009 Graph RAG Agent may invoke graph traversal as an additional tool alongside vector search
EAAPL-RAG003 Secure RAG ACL enforcement applies to every tool call within the agentic loop

17. Maturity Assessment

Overall Maturity: Proven — Agentic RAG is deployed in production for research and compliance use cases; ReAct and Plan-and-Execute frameworks are mature; the primary ongoing challenges are iteration cost management and reasoning trace audit quality.

Dimension Score (1–5) Rationale
Technology Readiness 4 LLM tool calling is GA and reliable; orchestration frameworks are production-grade
Tooling Ecosystem 4 LangChain, LlamaIndex, Bedrock Agents, Azure AI Studio support agentic patterns
Operational Guidance 3 Loop management and cost optimisation require tuning expertise
Security & Compliance 3 Prompt injection in multi-hop contexts and tool boundary enforcement require careful implementation
Scalability Evidence 3 Session-based; horizontal scaling straightforward; cost per session grows with complexity
Cost Predictability 2 Iteration count variability makes cost highly query-dependent; monitoring and alerting essential

18. Revision History

Version Date Author Changes
1.0 2024-07-01 EAAPL Working Group Initial publication
1.1 2024-10-15 EAAPL Working Group Self-critique module formalised; circular loop detection added
1.2 2025-04-01 EAAPL Working Group Human oversight gate added; EU AI Act Article 14 mapping; scratchpad compression strategy
← Back to LibraryMore Retrieval-Augmented Generation