EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryMulti-Agent Systems
Proven
⇄ Compare

EAAPL-MAG001 — Multi-Agent Orchestration

EAAPL-MAG001 — Multi-Agent Orchestration

Status: Proven Tags: agent orchestration high-availability high-complexity Version: 2.0.0 Last Updated: 2026-06-12


1. Pattern Identity

Field Value
Pattern ID EAAPL-MAG001
Name Multi-Agent Orchestration
Category Multi-Agent
Maturity Proven
Complexity High
Related Patterns EAAPL-MAG002 · EAAPL-MAG003 · EAAPL-MAG006 · EAAPL-INT007

2. Executive Summary

Multi-Agent Orchestration coordinates a cohort of specialised AI agents to complete tasks that exceed the capability, context window, or cost envelope of a single agent. The orchestrator pattern imposes a central coordinator that decomposes work, assigns subtasks to specialist agents, collects and validates results, and synthesises a final output. Parallel fan-out and pipeline variants extend this foundation for latency-sensitive and sequential workflows respectively. Enterprises adopt this pattern when a task requires heterogeneous expertise — legal reasoning, code generation, financial analysis, and summarisation simultaneously — or when a single large-context call breaches cost or safety governance thresholds. The pattern introduces distributed-systems complexity: partial failure, inter-agent communication, deadlock prevention, and aggregate observability must all be engineered deliberately.


3. Problem Statement

3.1 Context

Enterprise AI deployments increasingly encounter tasks that are too complex, too long, or too multi-domain for a single LLM invocation. A contract review task may require legal clause identification, risk scoring, negotiation strategy, and executive summary — each benefiting from a specialised prompt context, model selection, and toolset. A single agent handling all four roles produces mediocre results across all of them.

3.2 Forces in Tension

  • Specialisation vs. coordination overhead. Specialist agents produce higher-quality subtask outputs, but coordination introduces latency, token cost, and failure-mode complexity.
  • Parallelism vs. dependency management. Fan-out reduces wall-clock time but requires aggregation logic and partial-failure handling.
  • Autonomy vs. observability. Agents operating independently are scalable but opaque; centralised orchestration is observable but creates a single point of failure.
  • Cost vs. quality. More agents and rounds of synthesis improve quality but multiply token spend.

3.3 Failure Modes Without This Pattern

Without deliberate orchestration design, teams default to a single monolithic prompt that degrades in quality as task complexity grows, or ad-hoc chaining where agent outputs are passed forward without validation, allowing hallucinations to compound across the pipeline.


4. Solution

4.1 Orchestrator Pattern (Central Controller)

ARCHITECTURE DIAGRAM
flowchart TD subgraph Dispatch["Orchestration"] A[Task Request] B[Orchestrator Agent] C[Specialist Agents] end subgraph Synthesis["Result Handling"] D[Result Collector] E{Validate Results} end subgraph Outcome["Outcome"] F[Synthesised Output] G[Error / Retry] end A --> B --> C --> D --> E E -->|pass| F E -->|fail| G style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#d1fae5,stroke:#10b981 style G fill:#fee2e2,stroke:#ef4444

4.2 Pipeline Pattern (Sequential)

ARCHITECTURE DIAGRAM
flowchart TD subgraph Stages["Sequential Pipeline"] A[Raw Input] B[Stage 1 Agent] C[Stage 2 Agent] D[Stage 3 Agent] end subgraph Control["Validation Gates"] E{Stage 1 Valid} F{Stage 2 Valid} end A --> B --> E E -->|yes| C --> F E -->|no| G[Stage 1 Error] F -->|yes| D --> H[Final Output] F -->|no| I[Stage 2 Error] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f3e8ff,stroke:#a855f7 style G fill:#fee2e2,stroke:#ef4444 style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444

4.3 Parallel Fan-Out Pattern

ARCHITECTURE DIAGRAM
flowchart TD subgraph FanOut["Fan-Out"] A[Task] B[Fan-Out Router] C[Agent Alpha] D[Agent Beta] E[Agent Gamma] end subgraph Merge["Aggregation"] F[Aggregator] G{Results Valid} end A --> B B --> C --> F B --> D --> F B --> E --> F F --> G G -->|all pass| H[Merged Output] G -->|none| I[Fan-Out Failure] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#f0fdf4,stroke:#22c55e style G fill:#f3e8ff,stroke:#a855f7 style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444

5. Structure

5.1 Component Catalogue

Component Responsibility Technology Options
Orchestrator Agent Task decomposition, agent selection, result synthesis LLM with planning prompt, LangGraph, AutoGen
Specialist Agents Domain-specific subtask execution LLM with focused system prompt + tools
Task Queue Durable subtask dispatch and ordering Redis Streams, AWS SQS, Kafka
Result Store Collect and hold subtask outputs pending aggregation Redis, DynamoDB, Postgres
Validator Check each subtask output against schema and quality rules JSON schema + LLM-as-judge
Aggregator Merge validated subtask outputs into final response LLM synthesis prompt
Trace Collector Record agent spans, costs, latencies, tool calls OpenTelemetry + Jaeger/Tempo

5.2 Inter-Agent Communication Schema

{
  "taskId": "uuid-v4",
  "parentTaskId": "uuid-v4-or-null",
  "fromAgent": "orchestrator",
  "toAgent": "legal-analysis-agent",
  "timestamp": "ISO-8601",
  "subtaskDescription": "Extract all limitation-of-liability clauses",
  "inputPayload": { "contractText": "..." },
  "context": {
    "originalTaskGoal": "...",
    "completedSubtasks": ["clause-extraction"],
    "constraints": ["maxTokens:4000", "returnJSON:true"]
  },
  "costBudgetRemaining": 0.15,
  "callbackEndpoint": "https://orchestrator/callback/subtask",
  "timeoutMs": 30000
}

6. Behaviour

6.1 Task Decomposition Strategies

LLM-Based Planning. The orchestrator issues a meta-prompt instructing the model to return a structured JSON plan of subtasks with dependency declarations. Handles novel task types but introduces planning latency and is non-deterministic.

Rule-Based Routing. A deterministic decision tree maps task type codes to subtask templates. Zero planning latency. Breaks on unfamiliar task shapes.

Hybrid (Recommended). Rule-based for known task types; LLM planning as fallback for novel tasks. Enforced by a task-type registry with a fallback flag per entry.

6.2 Deadlock Prevention

  • Per-agent timeout. Every subtask message carries a timeoutMs field. The orchestrator cancels and marks as failed any agent that does not respond within this window.
  • Circuit breaker per agent. See EAAPL-INT007. If an agent's error rate or latency exceeds threshold, the orchestrator routes around it.
  • DAG validation at plan time. Before dispatching, the orchestrator validates the subtask dependency graph is acyclic. Any cycle is rejected with CYCLE_DETECTED.
  • Global task timeout. A wall-clock deadline enforces the customer-visible SLA regardless of individual subtask states.

6.3 Partial Failure Handling

Scenario Behaviour
One of N parallel agents fails Continue with remaining. Aggregator marks missing domain. Return partial result with explicit warning.
Critical-path pipeline agent fails Halt pipeline. Return structured error with completed stages preserved for retry.
Subtask validation fails Retry agent with correction prompt. On second failure, escalate to human queue or return partial.
Orchestrator crashes mid-run Task queue message remains un-acked. New instance resumes. Idempotency keys prevent duplicate work.

7. Implementation Guide

7.1 Step-by-Step

Step 1 — Define the Agent Registry. Enumerate every specialist agent: name, capability description, accepted input/output schemas, cost per call estimate, average latency, and SLA. This is the orchestrator's routing table.

Step 2 — Implement Decomposition. For LLM-based planning, craft a system prompt that returns structured JSON with subtasks, agent types, and dependsOn arrays. Validate the result is a DAG before proceeding. Enforce a maximum of 10 subtasks per plan.

Step 3 — Implement Dispatch and Collection. Use a durable queue for subtask dispatch. Each message includes taskId, subtaskId, callback URL, and timeout. Agents post results to the callback endpoint. Result store keyed on taskId:subtaskId.

Step 4 — Implement the Aggregator. Once all expected subtask results are received or timed out, invoke the aggregator LLM. Aggregation prompt must flag contradictions between subtask outputs and note missing domains explicitly.

Step 5 — Wire Distributed Tracing. Inject a W3C traceparent header into every inter-agent message. Each agent creates a child span. Use this to reconstruct the full execution DAG in your observability platform.

7.2 Code Skeleton (TypeScript)

interface SubTask {
  id: string;
  agentType: string;
  description: string;
  dependsOn: string[];
}

interface OrchestrationState {
  taskId: string;
  taskDescription: string;
  plan: SubTask[] | null;
  results: Record<string, SubTaskResult>;
  finalOutput: string | null;
  errors: string[];
  costSpent: number;
  costCeiling: number;
}

async function orchestrate(task: string, costCeiling: number): Promise<OrchestrationState> {
  const state: OrchestrationState = {
    taskId: crypto.randomUUID(),
    taskDescription: task,
    plan: null,
    results: {},
    finalOutput: null,
    errors: [],
    costSpent: 0,
    costCeiling
  };

  // Step 1: Decompose
  const plan = await plannerLLM.invoke(task);
  validateDAG(plan.subtasks); // throws CYCLE_DETECTED if invalid
  state.plan = plan.subtasks;

  // Step 2: Execute with dependency ordering
  const readyQueue = getReadySubtasks(state.plan, state.results);
  while (readyQueue.length > 0) {
    if (state.costSpent >= state.costCeiling) {
      state.errors.push("BUDGET_EXCEEDED");
      break;
    }
    await Promise.all(readyQueue.map(st => dispatchSubtask(state, st)));
    readyQueue.push(...getReadySubtasks(state.plan, state.results));
  }

  // Step 3: Aggregate
  const validResults = Object.values(state.results).filter(r => validate(r));
  state.finalOutput = await synthesizerLLM.invoke(validResults);
  return state;
}

8. Observability

8.1 Distributed Tracing

Every orchestration run produces a trace spanning the full agent DAG. Key spans:

  • orchestrator.decompose — planning latency and model used
  • orchestrator.dispatch.<subtaskId> — queue enqueue time
  • agent.<agentType>.<subtaskId> — wall-clock time, input/output token counts, tool calls
  • orchestrator.aggregate — synthesis latency and model used

8.2 Cost Dashboard

Aggregate (promptTokens + completionTokens) × model_cost_per_token across all agent spans for a single taskId. Alert when a single orchestration exceeds the configured cost ceiling. Cost attribution by agentType surfaces which specialists are most expensive.

8.3 Key Metrics

Metric Alert Threshold
Orchestration p95 latency > 30s
Subtask failure rate > 5% over 5m
Cost per task > configured ceiling
Dead-letter queue depth > 0
Agent timeout rate > 2% per agent type
Plan cycle detection rate > 0 (any is an engineering defect)

9. Cost Governance

  • Cost budget per task. costBudgetRemaining field in every subtask message. Agents return BUDGET_EXCEEDED if their estimated cost exceeds the remainder.
  • Model tiering. Route subtasks to cheaper models unless criticality classification requires a frontier model.
  • Context compression. Before passing outputs between agents, run a summarisation step that strips verbose reasoning while preserving factual content.
  • Max subtasks enforcement. Hard maximum of 10 subtasks per plan enforced at the decomposition validation layer.
  • Cost anomaly detection. If orchestration cost for a task type exceeds 2× its 7-day moving average, emit a high-priority alert and pause that task type pending review.

10. Security Considerations

10.1 Prompt Injection via Agent Outputs

A malicious input could embed instructions that, when passed between agents, hijack downstream behaviour. Mitigations:

  • Validate all inter-agent payloads against a strict JSON schema before forwarding.
  • Never interpolate raw agent output directly into another agent's system prompt.
  • Deploy an input/output safety classifier at the orchestrator layer scanning all inter-agent messages for injection patterns.

10.2 Data Leakage Across Agent Boundaries

Ensure taskId and tenantId are propagated and that the result store enforces row-level isolation. An agent must verify the tenantId matches before reading or writing results.

10.3 Credential Minimisation

Each specialist agent holds only the credentials required for its specific tools. Use a vault with per-agent scoped tokens that expire after the task TTL.


11. Failure Modes and Mitigations

Failure Mode Detection Mitigation
Orchestrator crash mid-run Queue message un-acked after visibility timeout Re-queue; idempotency keys prevent duplicate work
Agent hallucination in subtask output Validator rejects output Retry with correction prompt; flag low-confidence in final output
Deadlock — agents waiting on each other Global task timeout exceeded Per-agent timeout + DAG cycle validation at plan time
Cost explosion from runaway planning Cost budget exceeded alert Per-task ceiling; model tiering; max-subtask cap
Plan cycle detected DAG validation at decompose step Return CYCLE_DETECTED, reject plan, log for investigation
Poison message in task queue Consumer fails repeatedly on same message Dead-letter queue after N retries; alert on DLQ depth
Inconsistent partial results Aggregator contradiction detection Flag contradictions in output; escalate high-severity to human review

12. Compliance and Governance

12.1 EU AI Act Relevance

For high-risk AI systems (Annex III), the full agent execution trace is required as evidence of genuine human oversight. Every orchestration run must produce a complete trace of which agents ran, in what order, with what inputs and outputs; a cost and latency record; validation pass/fail status per subtask; and any human escalation events. Records must be retained per your data retention policy (minimum 5 years for regulated use cases) and producible within 72 hours for a regulatory audit.

12.2 Model Risk Management (SR 11-7)

For financial services applications, the orchestration system must document each specialist agent's model, version, system prompt, and validation criteria; run periodic backtests against labelled data to detect model drift; and maintain a rollback mechanism to a prior agent version when quality degradation is detected.


13. Testing Strategy

13.1 Unit Tests

  • Decomposition logic: given a task description, assert the plan is a valid DAG with expected agent types.
  • Validator logic: given valid and invalid subtask outputs, assert correct pass/fail classification.
  • Aggregator: given a set of subtask results including one failure, assert the output correctly notes the missing domain.

13.2 Integration Tests

  • Full orchestration run against stub agents returning pre-canned outputs. Assert final output schema and content.
  • Timeout scenario: one stub agent delays beyond timeoutMs. Assert orchestrator handles gracefully and returns partial result.
  • Cost ceiling scenario: mock agents that report high token usage. Assert orchestrator emits cost alert and halts.

13.3 Chaos / Resilience Tests

  • Kill the orchestrator process mid-run. Assert task is resumed from queue by a new instance with no duplicate work.
  • Kill one of three parallel agents mid-run. Assert final output is partial with explicit warning.
  • Inject a cycle into the plan at decompose time. Assert CYCLE_DETECTED is raised and no agents are dispatched.

13.4 End-to-End Playwright Tests

For every supported task type (contract review, code audit, financial analysis), run a real end-to-end test with live model calls against staging. Assert: output schema valid; all subtask spans present in trace; total cost within ±30% of baseline; no partial failure warnings when all agents are healthy.


14. Variants and Extensions

14.1 Recursive Orchestration

An orchestrator may spawn a child orchestrator for a subtask that is itself too complex. Maximum recursion depth must be bounded (recommended: 3) and enforced at the decomposition layer.

14.2 Dynamic Agent Registration

Specialist agents self-register with the orchestrator's registry at startup, advertising capability description, input/output schema, and cost/latency profile. Enables hot-deployment of new specialists without redeploying the orchestrator.

14.3 Human-in-the-Loop Integration

Insert EAAPL-MAG003 as a specialist agent type for high-stakes subtasks (financial transactions, external communications). The orchestrator's dependency graph suspends at approval checkpoints.

14.4 Stateful Long-Running Orchestration

For tasks spanning hours or days, persist orchestration state in a durable store after each completed subtask. The orchestrator reconstitutes state from the store on restart, enabling interruption and resumption without losing work.


15. Trade-off Analysis

Dimension Orchestrator Pattern Pipeline Pattern Parallel Fan-Out
Latency Moderate High (sequential) Low
Cost Moderate Moderate High (all agents run)
Resilience High Low (failure halts) High
Complexity High Low Moderate
Best for Multi-domain complex tasks Sequential workflows Independent parallel analysis

When NOT to use multi-agent orchestration:

  • The task can be solved by a single well-crafted prompt — always prefer the simpler option.
  • Latency requirements preclude orchestration overhead (sub-1s response time targets).
  • Your team does not have distributed observability infrastructure — multi-agent failure without tracing is a debugging nightmare.

16. Known Implementations

Organisation Type Use Case Topology Reported Outcome
Global law firm Contract due diligence (100+ page agreements) Orchestrator with 5 specialists 70% reduction in associate review time
Investment bank Earnings report analysis + trade signal generation Pipeline (extract → analyse → score → route) P95 latency 45s; 94% analyst agreement rate
Healthcare system Clinical note summarisation across 12 specialties Parallel fan-out with aggregator 8× throughput vs single-agent
E-commerce platform Fraud detection + risk scoring + customer communication Orchestrator with circuit breaker 99.7% uptime across 18-month production run

Pattern ID Name Relationship
EAAPL-MAG002 Supervisor Agent Specialised orchestrator with worker pool management
EAAPL-MAG003 Human-in-the-Loop Agent Inserts human approval into orchestration pipelines
EAAPL-MAG006 Agent Handoff Protocol Defines the message schema used between orchestrated agents
EAAPL-INT007 AI Circuit Breaker Applied per specialist agent in the orchestrator's dispatch layer
EAAPL-MAG004 Agent Swarm Emergent alternative to central orchestration for resilience-first use cases

18. References

  1. Gartner, "Patterns for Agentic AI Architecture," 2025 (ID: G00815432)
  2. Microsoft AutoGen: Multi-Agent Conversation Framework — github.com/microsoft/autogen
  3. LangGraph: Building Stateful Multi-Actor Applications — langchain-ai.github.io/langgraph
  4. OpenAI, "A Practical Guide to Building Agents," 2025 — platform.openai.com/docs/guides/agents
  5. W3C Trace Context Specification — w3.org/TR/trace-context
  6. NIST AI RMF 1.0, Govern 1.5: Human Review Procedures — nist.gov/aiRMF
  7. EU AI Act (Regulation 2024/1689), Article 14: Human Oversight
  8. SR 11-7: Guidance on Model Risk Management — federalreserve.gov/supervisionreg/srletters/sr1107.htm
  9. AWS Well-Architected Framework — Machine Learning Lens, Agent Workloads chapter
  10. Anthropic, "Building Effective Agents," 2025 — anthropic.com/research/building-effective-agents
← Back to LibraryMore Multi-Agent Systems