Proven

EAAPL-MAG001 — Multi-Agent Orchestration

Status: Proven Tags: agent orchestration high-availability high-complexity Version: 2.0.0 Last Updated: 2026-06-12

1. Pattern Identity

Field	Value
Pattern ID	EAAPL-MAG001
Name	Multi-Agent Orchestration
Category	Multi-Agent
Maturity	Proven
Complexity	High
Related Patterns	EAAPL-MAG002 · EAAPL-MAG003 · EAAPL-MAG006 · EAAPL-INT007

2. Executive Summary

Multi-Agent Orchestration coordinates a cohort of specialised AI agents to complete tasks that exceed the capability, context window, or cost envelope of a single agent. The orchestrator pattern imposes a central coordinator that decomposes work, assigns subtasks to specialist agents, collects and validates results, and synthesises a final output. Parallel fan-out and pipeline variants extend this foundation for latency-sensitive and sequential workflows respectively. Enterprises adopt this pattern when a task requires heterogeneous expertise — legal reasoning, code generation, financial analysis, and summarisation simultaneously — or when a single large-context call breaches cost or safety governance thresholds. The pattern introduces distributed-systems complexity: partial failure, inter-agent communication, deadlock prevention, and aggregate observability must all be engineered deliberately.

3. Problem Statement

3.1 Context

Enterprise AI deployments increasingly encounter tasks that are too complex, too long, or too multi-domain for a single LLM invocation. A contract review task may require legal clause identification, risk scoring, negotiation strategy, and executive summary — each benefiting from a specialised prompt context, model selection, and toolset. A single agent handling all four roles produces mediocre results across all of them.

3.2 Forces in Tension

Specialisation vs. coordination overhead. Specialist agents produce higher-quality subtask outputs, but coordination introduces latency, token cost, and failure-mode complexity.
Parallelism vs. dependency management. Fan-out reduces wall-clock time but requires aggregation logic and partial-failure handling.
Autonomy vs. observability. Agents operating independently are scalable but opaque; centralised orchestration is observable but creates a single point of failure.
Cost vs. quality. More agents and rounds of synthesis improve quality but multiply token spend.

3.3 Failure Modes Without This Pattern

Without deliberate orchestration design, teams default to a single monolithic prompt that degrades in quality as task complexity grows, or ad-hoc chaining where agent outputs are passed forward without validation, allowing hallucinations to compound across the pipeline.

4. Solution

4.1 Orchestrator Pattern (Central Controller)

ARCHITECTURE DIAGRAM

flowchart TD subgraph Dispatch["Orchestration"] A[Task Request] B[Orchestrator Agent] C[Specialist Agents] end subgraph Synthesis["Result Handling"] D[Result Collector] E{Validate Results} end subgraph Outcome["Outcome"] F[Synthesised Output] G[Error / Retry] end A --> B --> C --> D --> E E -->|pass| F E -->|fail| G style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#d1fae5,stroke:#10b981 style G fill:#fee2e2,stroke:#ef4444

4.2 Pipeline Pattern (Sequential)

ARCHITECTURE DIAGRAM

flowchart TD subgraph Stages["Sequential Pipeline"] A[Raw Input] B[Stage 1 Agent] C[Stage 2 Agent] D[Stage 3 Agent] end subgraph Control["Validation Gates"] E{Stage 1 Valid} F{Stage 2 Valid} end A --> B --> E E -->|yes| C --> F E -->|no| G[Stage 1 Error] F -->|yes| D --> H[Final Output] F -->|no| I[Stage 2 Error] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f3e8ff,stroke:#a855f7 style G fill:#fee2e2,stroke:#ef4444 style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444

4.3 Parallel Fan-Out Pattern

ARCHITECTURE DIAGRAM

flowchart TD subgraph FanOut["Fan-Out"] A[Task] B[Fan-Out Router] C[Agent Alpha] D[Agent Beta] E[Agent Gamma] end subgraph Merge["Aggregation"] F[Aggregator] G{Results Valid} end A --> B B --> C --> F B --> D --> F B --> E --> F F --> G G -->|all pass| H[Merged Output] G -->|none| I[Fan-Out Failure] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#f0fdf4,stroke:#22c55e style G fill:#f3e8ff,stroke:#a855f7 style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444

5. Structure

5.1 Component Catalogue

Component	Responsibility	Technology Options
Orchestrator Agent	Task decomposition, agent selection, result synthesis	LLM with planning prompt, LangGraph, AutoGen
Specialist Agents	Domain-specific subtask execution	LLM with focused system prompt + tools
Task Queue	Durable subtask dispatch and ordering	Redis Streams, AWS SQS, Kafka
Result Store	Collect and hold subtask outputs pending aggregation	Redis, DynamoDB, Postgres
Validator	Check each subtask output against schema and quality rules	JSON schema + LLM-as-judge
Aggregator	Merge validated subtask outputs into final response	LLM synthesis prompt
Trace Collector	Record agent spans, costs, latencies, tool calls	OpenTelemetry + Jaeger/Tempo

5.2 Inter-Agent Communication Schema

{
  "taskId": "uuid-v4",
  "parentTaskId": "uuid-v4-or-null",
  "fromAgent": "orchestrator",
  "toAgent": "legal-analysis-agent",
  "timestamp": "ISO-8601",
  "subtaskDescription": "Extract all limitation-of-liability clauses",
  "inputPayload": { "contractText": "..." },
  "context": {
    "originalTaskGoal": "...",
    "completedSubtasks": ["clause-extraction"],
    "constraints": ["maxTokens:4000", "returnJSON:true"]
  },
  "costBudgetRemaining": 0.15,
  "callbackEndpoint": "https://orchestrator/callback/subtask",
  "timeoutMs": 30000
}

6. Behaviour

6.1 Task Decomposition Strategies

LLM-Based Planning. The orchestrator issues a meta-prompt instructing the model to return a structured JSON plan of subtasks with dependency declarations. Handles novel task types but introduces planning latency and is non-deterministic.

Rule-Based Routing. A deterministic decision tree maps task type codes to subtask templates. Zero planning latency. Breaks on unfamiliar task shapes.

Hybrid (Recommended). Rule-based for known task types; LLM planning as fallback for novel tasks. Enforced by a task-type registry with a fallback flag per entry.

6.2 Deadlock Prevention

Per-agent timeout. Every subtask message carries a timeoutMs field. The orchestrator cancels and marks as failed any agent that does not respond within this window.
Circuit breaker per agent. See EAAPL-INT007. If an agent's error rate or latency exceeds threshold, the orchestrator routes around it.
DAG validation at plan time. Before dispatching, the orchestrator validates the subtask dependency graph is acyclic. Any cycle is rejected with CYCLE_DETECTED.
Global task timeout. A wall-clock deadline enforces the customer-visible SLA regardless of individual subtask states.

6.3 Partial Failure Handling

Scenario	Behaviour
One of N parallel agents fails	Continue with remaining. Aggregator marks missing domain. Return partial result with explicit warning.
Critical-path pipeline agent fails	Halt pipeline. Return structured error with completed stages preserved for retry.
Subtask validation fails	Retry agent with correction prompt. On second failure, escalate to human queue or return partial.
Orchestrator crashes mid-run	Task queue message remains un-acked. New instance resumes. Idempotency keys prevent duplicate work.

7. Implementation Guide

7.1 Step-by-Step

Step 1 — Define the Agent Registry. Enumerate every specialist agent: name, capability description, accepted input/output schemas, cost per call estimate, average latency, and SLA. This is the orchestrator's routing table.

Step 2 — Implement Decomposition. For LLM-based planning, craft a system prompt that returns structured JSON with subtasks, agent types, and dependsOn arrays. Validate the result is a DAG before proceeding. Enforce a maximum of 10 subtasks per plan.

Step 3 — Implement Dispatch and Collection. Use a durable queue for subtask dispatch. Each message includes taskId, subtaskId, callback URL, and timeout. Agents post results to the callback endpoint. Result store keyed on taskId:subtaskId.

Step 4 — Implement the Aggregator. Once all expected subtask results are received or timed out, invoke the aggregator LLM. Aggregation prompt must flag contradictions between subtask outputs and note missing domains explicitly.

Step 5 — Wire Distributed Tracing. Inject a W3C traceparent header into every inter-agent message. Each agent creates a child span. Use this to reconstruct the full execution DAG in your observability platform.

7.2 Code Skeleton (TypeScript)

interface SubTask {
  id: string;
  agentType: string;
  description: string;
  dependsOn: string[];
}

interface OrchestrationState {
  taskId: string;
  taskDescription: string;
  plan: SubTask[] | null;
  results: Record<string, SubTaskResult>;
  finalOutput: string | null;
  errors: string[];
  costSpent: number;
  costCeiling: number;
}

async function orchestrate(task: string, costCeiling: number): Promise<OrchestrationState> {
  const state: OrchestrationState = {
    taskId: crypto.randomUUID(),
    taskDescription: task,
    plan: null,
    results: {},
    finalOutput: null,
    errors: [],
    costSpent: 0,
    costCeiling
  };

  // Step 1: Decompose
  const plan = await plannerLLM.invoke(task);
  validateDAG(plan.subtasks); // throws CYCLE_DETECTED if invalid
  state.plan = plan.subtasks;

  // Step 2: Execute with dependency ordering
  const readyQueue = getReadySubtasks(state.plan, state.results);
  while (readyQueue.length > 0) {
    if (state.costSpent >= state.costCeiling) {
      state.errors.push("BUDGET_EXCEEDED");
      break;
    }
    await Promise.all(readyQueue.map(st => dispatchSubtask(state, st)));
    readyQueue.push(...getReadySubtasks(state.plan, state.results));
  }

  // Step 3: Aggregate
  const validResults = Object.values(state.results).filter(r => validate(r));
  state.finalOutput = await synthesizerLLM.invoke(validResults);
  return state;
}

8. Observability

8.1 Distributed Tracing

Every orchestration run produces a trace spanning the full agent DAG. Key spans:

orchestrator.decompose — planning latency and model used
orchestrator.dispatch.<subtaskId> — queue enqueue time
agent.<agentType>.<subtaskId> — wall-clock time, input/output token counts, tool calls
orchestrator.aggregate — synthesis latency and model used

8.2 Cost Dashboard

Aggregate (promptTokens + completionTokens) × model_cost_per_token across all agent spans for a single taskId. Alert when a single orchestration exceeds the configured cost ceiling. Cost attribution by agentType surfaces which specialists are most expensive.

8.3 Key Metrics

Metric	Alert Threshold
Orchestration p95 latency	> 30s
Subtask failure rate	> 5% over 5m
Cost per task	> configured ceiling
Dead-letter queue depth	> 0
Agent timeout rate	> 2% per agent type
Plan cycle detection rate	> 0 (any is an engineering defect)

9. Cost Governance

Cost budget per task. costBudgetRemaining field in every subtask message. Agents return BUDGET_EXCEEDED if their estimated cost exceeds the remainder.
Model tiering. Route subtasks to cheaper models unless criticality classification requires a frontier model.
Context compression. Before passing outputs between agents, run a summarisation step that strips verbose reasoning while preserving factual content.
Max subtasks enforcement. Hard maximum of 10 subtasks per plan enforced at the decomposition validation layer.
Cost anomaly detection. If orchestration cost for a task type exceeds 2× its 7-day moving average, emit a high-priority alert and pause that task type pending review.

10. Security Considerations

10.1 Prompt Injection via Agent Outputs

A malicious input could embed instructions that, when passed between agents, hijack downstream behaviour. Mitigations:

Validate all inter-agent payloads against a strict JSON schema before forwarding.
Never interpolate raw agent output directly into another agent's system prompt.
Deploy an input/output safety classifier at the orchestrator layer scanning all inter-agent messages for injection patterns.

10.2 Data Leakage Across Agent Boundaries

Ensure taskId and tenantId are propagated and that the result store enforces row-level isolation. An agent must verify the tenantId matches before reading or writing results.

10.3 Credential Minimisation

Each specialist agent holds only the credentials required for its specific tools. Use a vault with per-agent scoped tokens that expire after the task TTL.

11. Failure Modes and Mitigations

Failure Mode	Detection	Mitigation
Orchestrator crash mid-run	Queue message un-acked after visibility timeout	Re-queue; idempotency keys prevent duplicate work
Agent hallucination in subtask output	Validator rejects output	Retry with correction prompt; flag low-confidence in final output
Deadlock — agents waiting on each other	Global task timeout exceeded	Per-agent timeout + DAG cycle validation at plan time
Cost explosion from runaway planning	Cost budget exceeded alert	Per-task ceiling; model tiering; max-subtask cap
Plan cycle detected	DAG validation at decompose step	Return `CYCLE_DETECTED`, reject plan, log for investigation
Poison message in task queue	Consumer fails repeatedly on same message	Dead-letter queue after N retries; alert on DLQ depth
Inconsistent partial results	Aggregator contradiction detection	Flag contradictions in output; escalate high-severity to human review

12. Compliance and Governance

12.1 EU AI Act Relevance

For high-risk AI systems (Annex III), the full agent execution trace is required as evidence of genuine human oversight. Every orchestration run must produce a complete trace of which agents ran, in what order, with what inputs and outputs; a cost and latency record; validation pass/fail status per subtask; and any human escalation events. Records must be retained per your data retention policy (minimum 5 years for regulated use cases) and producible within 72 hours for a regulatory audit.

12.2 Model Risk Management (SR 11-7)

For financial services applications, the orchestration system must document each specialist agent's model, version, system prompt, and validation criteria; run periodic backtests against labelled data to detect model drift; and maintain a rollback mechanism to a prior agent version when quality degradation is detected.

13. Testing Strategy

13.1 Unit Tests

Decomposition logic: given a task description, assert the plan is a valid DAG with expected agent types.
Validator logic: given valid and invalid subtask outputs, assert correct pass/fail classification.
Aggregator: given a set of subtask results including one failure, assert the output correctly notes the missing domain.

13.2 Integration Tests

Full orchestration run against stub agents returning pre-canned outputs. Assert final output schema and content.
Timeout scenario: one stub agent delays beyond timeoutMs. Assert orchestrator handles gracefully and returns partial result.
Cost ceiling scenario: mock agents that report high token usage. Assert orchestrator emits cost alert and halts.

13.3 Chaos / Resilience Tests

Kill the orchestrator process mid-run. Assert task is resumed from queue by a new instance with no duplicate work.
Kill one of three parallel agents mid-run. Assert final output is partial with explicit warning.
Inject a cycle into the plan at decompose time. Assert CYCLE_DETECTED is raised and no agents are dispatched.

13.4 End-to-End Playwright Tests

For every supported task type (contract review, code audit, financial analysis), run a real end-to-end test with live model calls against staging. Assert: output schema valid; all subtask spans present in trace; total cost within ±30% of baseline; no partial failure warnings when all agents are healthy.

14. Variants and Extensions

14.1 Recursive Orchestration

An orchestrator may spawn a child orchestrator for a subtask that is itself too complex. Maximum recursion depth must be bounded (recommended: 3) and enforced at the decomposition layer.

14.2 Dynamic Agent Registration

Specialist agents self-register with the orchestrator's registry at startup, advertising capability description, input/output schema, and cost/latency profile. Enables hot-deployment of new specialists without redeploying the orchestrator.

14.3 Human-in-the-Loop Integration

Insert EAAPL-MAG003 as a specialist agent type for high-stakes subtasks (financial transactions, external communications). The orchestrator's dependency graph suspends at approval checkpoints.

14.4 Stateful Long-Running Orchestration

For tasks spanning hours or days, persist orchestration state in a durable store after each completed subtask. The orchestrator reconstitutes state from the store on restart, enabling interruption and resumption without losing work.

15. Trade-off Analysis

Dimension	Orchestrator Pattern	Pipeline Pattern	Parallel Fan-Out
Latency	Moderate	High (sequential)	Low
Cost	Moderate	Moderate	High (all agents run)
Resilience	High	Low (failure halts)	High
Complexity	High	Low	Moderate
Best for	Multi-domain complex tasks	Sequential workflows	Independent parallel analysis

When NOT to use multi-agent orchestration:

The task can be solved by a single well-crafted prompt — always prefer the simpler option.
Latency requirements preclude orchestration overhead (sub-1s response time targets).
Your team does not have distributed observability infrastructure — multi-agent failure without tracing is a debugging nightmare.

16. Known Implementations

Organisation Type	Use Case	Topology	Reported Outcome
Global law firm	Contract due diligence (100+ page agreements)	Orchestrator with 5 specialists	70% reduction in associate review time
Investment bank	Earnings report analysis + trade signal generation	Pipeline (extract → analyse → score → route)	P95 latency 45s; 94% analyst agreement rate
Healthcare system	Clinical note summarisation across 12 specialties	Parallel fan-out with aggregator	8× throughput vs single-agent
E-commerce platform	Fraud detection + risk scoring + customer communication	Orchestrator with circuit breaker	99.7% uptime across 18-month production run

Pattern ID	Name	Relationship
EAAPL-MAG002	Supervisor Agent	Specialised orchestrator with worker pool management
EAAPL-MAG003	Human-in-the-Loop Agent	Inserts human approval into orchestration pipelines
EAAPL-MAG006	Agent Handoff Protocol	Defines the message schema used between orchestrated agents
EAAPL-INT007	AI Circuit Breaker	Applied per specialist agent in the orchestrator's dispatch layer
EAAPL-MAG004	Agent Swarm	Emergent alternative to central orchestration for resilience-first use cases

18. References

Gartner, "Patterns for Agentic AI Architecture," 2025 (ID: G00815432)
Microsoft AutoGen: Multi-Agent Conversation Framework — github.com/microsoft/autogen
LangGraph: Building Stateful Multi-Actor Applications — langchain-ai.github.io/langgraph
OpenAI, "A Practical Guide to Building Agents," 2025 — platform.openai.com/docs/guides/agents
W3C Trace Context Specification — w3.org/TR/trace-context
NIST AI RMF 1.0, Govern 1.5: Human Review Procedures — nist.gov/aiRMF
EU AI Act (Regulation 2024/1689), Article 14: Human Oversight
SR 11-7: Guidance on Model Risk Management — federalreserve.gov/supervisionreg/srletters/sr1107.htm
AWS Well-Architected Framework — Machine Learning Lens, Agent Workloads chapter
Anthropic, "Building Effective Agents," 2025 — anthropic.com/research/building-effective-agents

← Back to Library More Multi-Agent Systems →

EAAPL-MAG001 — Multi-Agent Orchestration

EAAPL-MAG001 — Multi-Agent Orchestration

1. Pattern Identity

2. Executive Summary

3. Problem Statement

3.1 Context

3.2 Forces in Tension

3.3 Failure Modes Without This Pattern

4. Solution

4.1 Orchestrator Pattern (Central Controller)

4.2 Pipeline Pattern (Sequential)

4.3 Parallel Fan-Out Pattern

5. Structure

5.1 Component Catalogue

5.2 Inter-Agent Communication Schema

6. Behaviour

6.1 Task Decomposition Strategies

6.2 Deadlock Prevention

6.3 Partial Failure Handling

7. Implementation Guide

7.1 Step-by-Step

7.2 Code Skeleton (TypeScript)

8. Observability

8.1 Distributed Tracing

8.2 Cost Dashboard

8.3 Key Metrics

9. Cost Governance

10. Security Considerations

10.1 Prompt Injection via Agent Outputs

10.2 Data Leakage Across Agent Boundaries

10.3 Credential Minimisation

11. Failure Modes and Mitigations

12. Compliance and Governance

12.1 EU AI Act Relevance

12.2 Model Risk Management (SR 11-7)

13. Testing Strategy

13.1 Unit Tests

13.2 Integration Tests

13.3 Chaos / Resilience Tests

13.4 End-to-End Playwright Tests

14. Variants and Extensions

14.1 Recursive Orchestration

14.2 Dynamic Agent Registration

14.3 Human-in-the-Loop Integration

14.4 Stateful Long-Running Orchestration

15. Trade-off Analysis

16. Known Implementations

17. Related Patterns

18. References