EAAPL-MAG001 — Multi-Agent Orchestration
Status: Proven
Tags: agent orchestration high-availability high-complexity
Version: 2.0.0
Last Updated: 2026-06-12
1. Pattern Identity
| Field | Value |
|---|---|
| Pattern ID | EAAPL-MAG001 |
| Name | Multi-Agent Orchestration |
| Category | Multi-Agent |
| Maturity | Proven |
| Complexity | High |
| Related Patterns | EAAPL-MAG002 · EAAPL-MAG003 · EAAPL-MAG006 · EAAPL-INT007 |
2. Executive Summary
Multi-Agent Orchestration coordinates a cohort of specialised AI agents to complete tasks that exceed the capability, context window, or cost envelope of a single agent. The orchestrator pattern imposes a central coordinator that decomposes work, assigns subtasks to specialist agents, collects and validates results, and synthesises a final output. Parallel fan-out and pipeline variants extend this foundation for latency-sensitive and sequential workflows respectively. Enterprises adopt this pattern when a task requires heterogeneous expertise — legal reasoning, code generation, financial analysis, and summarisation simultaneously — or when a single large-context call breaches cost or safety governance thresholds. The pattern introduces distributed-systems complexity: partial failure, inter-agent communication, deadlock prevention, and aggregate observability must all be engineered deliberately.
3. Problem Statement
3.1 Context
Enterprise AI deployments increasingly encounter tasks that are too complex, too long, or too multi-domain for a single LLM invocation. A contract review task may require legal clause identification, risk scoring, negotiation strategy, and executive summary — each benefiting from a specialised prompt context, model selection, and toolset. A single agent handling all four roles produces mediocre results across all of them.
3.2 Forces in Tension
- Specialisation vs. coordination overhead. Specialist agents produce higher-quality subtask outputs, but coordination introduces latency, token cost, and failure-mode complexity.
- Parallelism vs. dependency management. Fan-out reduces wall-clock time but requires aggregation logic and partial-failure handling.
- Autonomy vs. observability. Agents operating independently are scalable but opaque; centralised orchestration is observable but creates a single point of failure.
- Cost vs. quality. More agents and rounds of synthesis improve quality but multiply token spend.
3.3 Failure Modes Without This Pattern
Without deliberate orchestration design, teams default to a single monolithic prompt that degrades in quality as task complexity grows, or ad-hoc chaining where agent outputs are passed forward without validation, allowing hallucinations to compound across the pipeline.
4. Solution
4.1 Orchestrator Pattern (Central Controller)
4.2 Pipeline Pattern (Sequential)
4.3 Parallel Fan-Out Pattern
5. Structure
5.1 Component Catalogue
| Component | Responsibility | Technology Options |
|---|---|---|
| Orchestrator Agent | Task decomposition, agent selection, result synthesis | LLM with planning prompt, LangGraph, AutoGen |
| Specialist Agents | Domain-specific subtask execution | LLM with focused system prompt + tools |
| Task Queue | Durable subtask dispatch and ordering | Redis Streams, AWS SQS, Kafka |
| Result Store | Collect and hold subtask outputs pending aggregation | Redis, DynamoDB, Postgres |
| Validator | Check each subtask output against schema and quality rules | JSON schema + LLM-as-judge |
| Aggregator | Merge validated subtask outputs into final response | LLM synthesis prompt |
| Trace Collector | Record agent spans, costs, latencies, tool calls | OpenTelemetry + Jaeger/Tempo |
5.2 Inter-Agent Communication Schema
{
"taskId": "uuid-v4",
"parentTaskId": "uuid-v4-or-null",
"fromAgent": "orchestrator",
"toAgent": "legal-analysis-agent",
"timestamp": "ISO-8601",
"subtaskDescription": "Extract all limitation-of-liability clauses",
"inputPayload": { "contractText": "..." },
"context": {
"originalTaskGoal": "...",
"completedSubtasks": ["clause-extraction"],
"constraints": ["maxTokens:4000", "returnJSON:true"]
},
"costBudgetRemaining": 0.15,
"callbackEndpoint": "https://orchestrator/callback/subtask",
"timeoutMs": 30000
}
6. Behaviour
6.1 Task Decomposition Strategies
LLM-Based Planning. The orchestrator issues a meta-prompt instructing the model to return a structured JSON plan of subtasks with dependency declarations. Handles novel task types but introduces planning latency and is non-deterministic.
Rule-Based Routing. A deterministic decision tree maps task type codes to subtask templates. Zero planning latency. Breaks on unfamiliar task shapes.
Hybrid (Recommended). Rule-based for known task types; LLM planning as fallback for novel tasks. Enforced by a task-type registry with a fallback flag per entry.
6.2 Deadlock Prevention
- Per-agent timeout. Every subtask message carries a
timeoutMsfield. The orchestrator cancels and marks as failed any agent that does not respond within this window. - Circuit breaker per agent. See EAAPL-INT007. If an agent's error rate or latency exceeds threshold, the orchestrator routes around it.
- DAG validation at plan time. Before dispatching, the orchestrator validates the subtask dependency graph is acyclic. Any cycle is rejected with
CYCLE_DETECTED. - Global task timeout. A wall-clock deadline enforces the customer-visible SLA regardless of individual subtask states.
6.3 Partial Failure Handling
| Scenario | Behaviour |
|---|---|
| One of N parallel agents fails | Continue with remaining. Aggregator marks missing domain. Return partial result with explicit warning. |
| Critical-path pipeline agent fails | Halt pipeline. Return structured error with completed stages preserved for retry. |
| Subtask validation fails | Retry agent with correction prompt. On second failure, escalate to human queue or return partial. |
| Orchestrator crashes mid-run | Task queue message remains un-acked. New instance resumes. Idempotency keys prevent duplicate work. |
7. Implementation Guide
7.1 Step-by-Step
Step 1 — Define the Agent Registry. Enumerate every specialist agent: name, capability description, accepted input/output schemas, cost per call estimate, average latency, and SLA. This is the orchestrator's routing table.
Step 2 — Implement Decomposition. For LLM-based planning, craft a system prompt that returns structured JSON with subtasks, agent types, and dependsOn arrays. Validate the result is a DAG before proceeding. Enforce a maximum of 10 subtasks per plan.
Step 3 — Implement Dispatch and Collection. Use a durable queue for subtask dispatch. Each message includes taskId, subtaskId, callback URL, and timeout. Agents post results to the callback endpoint. Result store keyed on taskId:subtaskId.
Step 4 — Implement the Aggregator. Once all expected subtask results are received or timed out, invoke the aggregator LLM. Aggregation prompt must flag contradictions between subtask outputs and note missing domains explicitly.
Step 5 — Wire Distributed Tracing. Inject a W3C traceparent header into every inter-agent message. Each agent creates a child span. Use this to reconstruct the full execution DAG in your observability platform.
7.2 Code Skeleton (TypeScript)
interface SubTask {
id: string;
agentType: string;
description: string;
dependsOn: string[];
}
interface OrchestrationState {
taskId: string;
taskDescription: string;
plan: SubTask[] | null;
results: Record<string, SubTaskResult>;
finalOutput: string | null;
errors: string[];
costSpent: number;
costCeiling: number;
}
async function orchestrate(task: string, costCeiling: number): Promise<OrchestrationState> {
const state: OrchestrationState = {
taskId: crypto.randomUUID(),
taskDescription: task,
plan: null,
results: {},
finalOutput: null,
errors: [],
costSpent: 0,
costCeiling
};
// Step 1: Decompose
const plan = await plannerLLM.invoke(task);
validateDAG(plan.subtasks); // throws CYCLE_DETECTED if invalid
state.plan = plan.subtasks;
// Step 2: Execute with dependency ordering
const readyQueue = getReadySubtasks(state.plan, state.results);
while (readyQueue.length > 0) {
if (state.costSpent >= state.costCeiling) {
state.errors.push("BUDGET_EXCEEDED");
break;
}
await Promise.all(readyQueue.map(st => dispatchSubtask(state, st)));
readyQueue.push(...getReadySubtasks(state.plan, state.results));
}
// Step 3: Aggregate
const validResults = Object.values(state.results).filter(r => validate(r));
state.finalOutput = await synthesizerLLM.invoke(validResults);
return state;
}
8. Observability
8.1 Distributed Tracing
Every orchestration run produces a trace spanning the full agent DAG. Key spans:
orchestrator.decompose— planning latency and model usedorchestrator.dispatch.<subtaskId>— queue enqueue timeagent.<agentType>.<subtaskId>— wall-clock time, input/output token counts, tool callsorchestrator.aggregate— synthesis latency and model used
8.2 Cost Dashboard
Aggregate (promptTokens + completionTokens) × model_cost_per_token across all agent spans for a single taskId. Alert when a single orchestration exceeds the configured cost ceiling. Cost attribution by agentType surfaces which specialists are most expensive.
8.3 Key Metrics
| Metric | Alert Threshold |
|---|---|
| Orchestration p95 latency | > 30s |
| Subtask failure rate | > 5% over 5m |
| Cost per task | > configured ceiling |
| Dead-letter queue depth | > 0 |
| Agent timeout rate | > 2% per agent type |
| Plan cycle detection rate | > 0 (any is an engineering defect) |
9. Cost Governance
- Cost budget per task.
costBudgetRemainingfield in every subtask message. Agents returnBUDGET_EXCEEDEDif their estimated cost exceeds the remainder. - Model tiering. Route subtasks to cheaper models unless criticality classification requires a frontier model.
- Context compression. Before passing outputs between agents, run a summarisation step that strips verbose reasoning while preserving factual content.
- Max subtasks enforcement. Hard maximum of 10 subtasks per plan enforced at the decomposition validation layer.
- Cost anomaly detection. If orchestration cost for a task type exceeds 2× its 7-day moving average, emit a high-priority alert and pause that task type pending review.
10. Security Considerations
10.1 Prompt Injection via Agent Outputs
A malicious input could embed instructions that, when passed between agents, hijack downstream behaviour. Mitigations:
- Validate all inter-agent payloads against a strict JSON schema before forwarding.
- Never interpolate raw agent output directly into another agent's system prompt.
- Deploy an input/output safety classifier at the orchestrator layer scanning all inter-agent messages for injection patterns.
10.2 Data Leakage Across Agent Boundaries
Ensure taskId and tenantId are propagated and that the result store enforces row-level isolation. An agent must verify the tenantId matches before reading or writing results.
10.3 Credential Minimisation
Each specialist agent holds only the credentials required for its specific tools. Use a vault with per-agent scoped tokens that expire after the task TTL.
11. Failure Modes and Mitigations
| Failure Mode | Detection | Mitigation |
|---|---|---|
| Orchestrator crash mid-run | Queue message un-acked after visibility timeout | Re-queue; idempotency keys prevent duplicate work |
| Agent hallucination in subtask output | Validator rejects output | Retry with correction prompt; flag low-confidence in final output |
| Deadlock — agents waiting on each other | Global task timeout exceeded | Per-agent timeout + DAG cycle validation at plan time |
| Cost explosion from runaway planning | Cost budget exceeded alert | Per-task ceiling; model tiering; max-subtask cap |
| Plan cycle detected | DAG validation at decompose step | Return CYCLE_DETECTED, reject plan, log for investigation |
| Poison message in task queue | Consumer fails repeatedly on same message | Dead-letter queue after N retries; alert on DLQ depth |
| Inconsistent partial results | Aggregator contradiction detection | Flag contradictions in output; escalate high-severity to human review |
12. Compliance and Governance
12.1 EU AI Act Relevance
For high-risk AI systems (Annex III), the full agent execution trace is required as evidence of genuine human oversight. Every orchestration run must produce a complete trace of which agents ran, in what order, with what inputs and outputs; a cost and latency record; validation pass/fail status per subtask; and any human escalation events. Records must be retained per your data retention policy (minimum 5 years for regulated use cases) and producible within 72 hours for a regulatory audit.
12.2 Model Risk Management (SR 11-7)
For financial services applications, the orchestration system must document each specialist agent's model, version, system prompt, and validation criteria; run periodic backtests against labelled data to detect model drift; and maintain a rollback mechanism to a prior agent version when quality degradation is detected.
13. Testing Strategy
13.1 Unit Tests
- Decomposition logic: given a task description, assert the plan is a valid DAG with expected agent types.
- Validator logic: given valid and invalid subtask outputs, assert correct pass/fail classification.
- Aggregator: given a set of subtask results including one failure, assert the output correctly notes the missing domain.
13.2 Integration Tests
- Full orchestration run against stub agents returning pre-canned outputs. Assert final output schema and content.
- Timeout scenario: one stub agent delays beyond
timeoutMs. Assert orchestrator handles gracefully and returns partial result. - Cost ceiling scenario: mock agents that report high token usage. Assert orchestrator emits cost alert and halts.
13.3 Chaos / Resilience Tests
- Kill the orchestrator process mid-run. Assert task is resumed from queue by a new instance with no duplicate work.
- Kill one of three parallel agents mid-run. Assert final output is partial with explicit warning.
- Inject a cycle into the plan at decompose time. Assert
CYCLE_DETECTEDis raised and no agents are dispatched.
13.4 End-to-End Playwright Tests
For every supported task type (contract review, code audit, financial analysis), run a real end-to-end test with live model calls against staging. Assert: output schema valid; all subtask spans present in trace; total cost within ±30% of baseline; no partial failure warnings when all agents are healthy.
14. Variants and Extensions
14.1 Recursive Orchestration
An orchestrator may spawn a child orchestrator for a subtask that is itself too complex. Maximum recursion depth must be bounded (recommended: 3) and enforced at the decomposition layer.
14.2 Dynamic Agent Registration
Specialist agents self-register with the orchestrator's registry at startup, advertising capability description, input/output schema, and cost/latency profile. Enables hot-deployment of new specialists without redeploying the orchestrator.
14.3 Human-in-the-Loop Integration
Insert EAAPL-MAG003 as a specialist agent type for high-stakes subtasks (financial transactions, external communications). The orchestrator's dependency graph suspends at approval checkpoints.
14.4 Stateful Long-Running Orchestration
For tasks spanning hours or days, persist orchestration state in a durable store after each completed subtask. The orchestrator reconstitutes state from the store on restart, enabling interruption and resumption without losing work.
15. Trade-off Analysis
| Dimension | Orchestrator Pattern | Pipeline Pattern | Parallel Fan-Out |
|---|---|---|---|
| Latency | Moderate | High (sequential) | Low |
| Cost | Moderate | Moderate | High (all agents run) |
| Resilience | High | Low (failure halts) | High |
| Complexity | High | Low | Moderate |
| Best for | Multi-domain complex tasks | Sequential workflows | Independent parallel analysis |
When NOT to use multi-agent orchestration:
- The task can be solved by a single well-crafted prompt — always prefer the simpler option.
- Latency requirements preclude orchestration overhead (sub-1s response time targets).
- Your team does not have distributed observability infrastructure — multi-agent failure without tracing is a debugging nightmare.
16. Known Implementations
| Organisation Type | Use Case | Topology | Reported Outcome |
|---|---|---|---|
| Global law firm | Contract due diligence (100+ page agreements) | Orchestrator with 5 specialists | 70% reduction in associate review time |
| Investment bank | Earnings report analysis + trade signal generation | Pipeline (extract → analyse → score → route) | P95 latency 45s; 94% analyst agreement rate |
| Healthcare system | Clinical note summarisation across 12 specialties | Parallel fan-out with aggregator | 8× throughput vs single-agent |
| E-commerce platform | Fraud detection + risk scoring + customer communication | Orchestrator with circuit breaker | 99.7% uptime across 18-month production run |
17. Related Patterns
| Pattern ID | Name | Relationship |
|---|---|---|
| EAAPL-MAG002 | Supervisor Agent | Specialised orchestrator with worker pool management |
| EAAPL-MAG003 | Human-in-the-Loop Agent | Inserts human approval into orchestration pipelines |
| EAAPL-MAG006 | Agent Handoff Protocol | Defines the message schema used between orchestrated agents |
| EAAPL-INT007 | AI Circuit Breaker | Applied per specialist agent in the orchestrator's dispatch layer |
| EAAPL-MAG004 | Agent Swarm | Emergent alternative to central orchestration for resilience-first use cases |
18. References
- Gartner, "Patterns for Agentic AI Architecture," 2025 (ID: G00815432)
- Microsoft AutoGen: Multi-Agent Conversation Framework — github.com/microsoft/autogen
- LangGraph: Building Stateful Multi-Actor Applications — langchain-ai.github.io/langgraph
- OpenAI, "A Practical Guide to Building Agents," 2025 — platform.openai.com/docs/guides/agents
- W3C Trace Context Specification — w3.org/TR/trace-context
- NIST AI RMF 1.0, Govern 1.5: Human Review Procedures — nist.gov/aiRMF
- EU AI Act (Regulation 2024/1689), Article 14: Human Oversight
- SR 11-7: Guidance on Model Risk Management — federalreserve.gov/supervisionreg/srletters/sr1107.htm
- AWS Well-Architected Framework — Machine Learning Lens, Agent Workloads chapter
- Anthropic, "Building Effective Agents," 2025 — anthropic.com/research/building-effective-agents