EAAPL-INT003 — AI-Powered API Composition
Tags: agent function-calling llm high-complexity
Status: Proven | Version: 1.0 | Domain: Integration
1. Executive Summary
AI-Powered API Composition uses large language models with function-calling capability to dynamically orchestrate sequences of API calls in response to natural language requests. Rather than hard-coding integration workflows, an LLM reads a structured catalogue of available APIs, selects appropriate endpoints, extracts parameters from natural language, constructs valid API calls, interprets responses, and chains multi-step operations to fulfil complex user intents.
This pattern dissolves a long-standing bottleneck in enterprise integration: the requirement for developers to pre-define every possible integration workflow. Natural language input replaces workflow configuration. Enterprises with dozens of internal APIs can expose them as a unified intelligent interface without building bespoke orchestration logic for each combination.
The security and governance challenges are significant. An LLM orchestrating real API calls on behalf of users must operate strictly within the caller's permission boundary, must never escalate privileges through creative API chaining, and must produce auditable records of every API call it initiates. For CIOs and CTOs, the pattern's value proposition is the elimination of integration backlog: business users can compose workflows that previously required weeks of developer time, within the constraints of the enterprise API catalogue and the user's existing access rights.
2. Problem Statement
Business Problem
Enterprise integration backlogs are structural. Business users need to combine data from multiple systems — CRM, ERP, HRIS, analytics — to complete a single task. Each combination requires a developer to build a bespoke integration. The backlog of requested integrations grows faster than delivery capacity. Meanwhile, users work around the gap by manually copying data between systems, introducing errors and latency.
Technical Problem
API catalogues in large enterprises contain hundreds of endpoints across dozens of systems. Building and maintaining static orchestration logic for every valid API composition is neither scalable nor sustainable. Schema changes in any component API can break dependent compositions, requiring coordinated fixes across all workflows that reference that API.
Symptoms
- Business analysts run manual multi-step copy-paste workflows between systems to produce outputs that could be automated.
- Integration backlog time-to-delivery exceeds 6 months for moderate-complexity compositions.
- API usage is heavily skewed to a small number of heavily-used endpoints; the long tail of potentially valuable APIs is never consumed because no one has built the integration.
- Every organisational restructure triggers a large wave of integration rebuild work because workflows are coded to specific system combinations.
Cost of Inaction
- Productivity: A knowledge worker spending 2 hours per day on manual cross-system data tasks represents $25,000–$50,000 per year in recoverable productivity per person at knowledge worker cost rates.
- Quality: Manual data transfer introduces error rates that automated API composition eliminates — typically 1–5% error rate in manual transcription.
- Velocity: Six-month integration delivery cycles mean business opportunities cannot be acted on at the speed of need.
- Talent: Senior developers spend 30–40% of time on integration plumbing that could be automated.
3. Context
When to Apply
- The enterprise has a curated internal API catalogue with machine-readable definitions (OpenAPI 3.x preferred).
- User requests are sufficiently complex and varied that pre-defining every workflow is impractical.
- A natural language interface is an appropriate UX pattern for the target users (knowledge workers, internal staff).
- The enterprise API catalogue exposes read operations and bounded write operations within clear permission scopes.
When NOT to Apply
- Simple, fixed workflows where a traditional integration approach is more predictable and auditable.
- High-frequency, low-latency automation (sub-second SLA) — LLM orchestration adds 1–10 seconds of planning latency.
- Workflows touching systems with no permission model — AI-initiated calls cannot be safely scoped.
- Public-facing user interactions where LLM API call errors could expose internal system structure to external users.
- Organisations without a managed API catalogue — deploying this pattern without API governance creates uncontrolled lateral movement risk.
Prerequisites
- Curated API catalogue with OpenAPI 3.x definitions for all APIs to be exposed to the LLM.
- API gateway enforcing per-user, per-scope permission model.
- Centralised secrets management; per-user OAuth tokens or API keys for downstream API calls.
- Observability platform capable of correlating LLM planning decisions with downstream API calls.
- LLM with reliable function-calling capability (tool use specification).
Industry Applicability
| Industry |
Applicability |
Use Case Examples |
Risk Level |
| Financial Services |
High |
"Show me all overdue invoices for clients in arrears over 90 days and draft a collection letter for the top 10" — composes AR + CRM + document generation APIs |
High — must enforce financial data access controls |
| Professional Services |
High |
"Get all timesheets for project X this month, calculate burn rate, and update the project dashboard" — composes HRIS + project management + BI APIs |
Medium |
| Government |
Medium |
"Find all open permits in postcode 3000 and check for safety inspection overdue" — composes spatial + licensing + inspection APIs |
High — privacy and access control critical |
| Healthcare |
Medium |
"List patients due for follow-up after discharge and prepare recall notices" — composes EMR + scheduling + communication APIs |
Very High — PHI controls mandatory |
| Retail / eCommerce |
Medium |
"Find all high-value customers with recent cart abandonment and prepare personalised offer codes" — composes CRM + cart + promotions APIs |
Low |
| Manufacturing |
Low |
Complex operational workflows are better served by deterministic RPA than probabilistic LLM orchestration |
Low |
4. Architecture Overview
AI-Powered API Composition is a request-time orchestration architecture with six logical stages: intent capture, API catalogue resolution, plan generation, parameter extraction and validation, plan execution, and result synthesis.
API Catalogue as LLM Context. The foundation of the pattern is a well-curated API catalogue presented to the LLM as tool definitions. Each API endpoint is described as a function: function name, description (written for LLM comprehension, not developer reference), parameter names and types, parameter descriptions, and example values. The OpenAPI specification is the source of truth; a catalogue compiler translates OpenAPI definitions into LLM tool format. The catalogue is not static — it is filtered at request time to expose only the APIs for which the calling user has permission. An LLM receiving a 300-tool catalogue performs significantly worse than one receiving a 20-tool filtered view; relevance filtering by intent category improves plan quality by 40–60% in empirical testing.
Intent Analysis and Catalogue Scoping. On receiving a natural language request, the composition engine first classifies the request intent against the API domain taxonomy. This narrows the tool set from the full catalogue to the relevant domain subset before the LLM sees it. Intent classification can be performed by a smaller, faster model to reduce planning latency.
Plan Generation. The LLM receives the filtered tool catalogue, the user's request, the user's identity context (name, role, permission scopes), and the conversation history. It produces a structured execution plan: an ordered list of tool calls with parameter values derived from the request. The plan is generated before any API call is made. The planning step is inspectable — the plan is logged and can be presented to the user for confirmation on high-stakes operations.
Parameter Extraction and Validation. LLM-extracted parameters are validated against the API schema before execution. This validation catches type errors (string where integer expected), range violations, and missing required parameters. Validation failures are returned to the LLM with the specific error, enabling re-planning. The LLM does not have direct access to execute API calls; all execution goes through the validated execution harness.
Dynamic API Chain Execution. The execution harness runs the plan step by step. Each step's output is available to subsequent steps as context. The LLM may be re-invoked between steps in complex chains where the next API selection depends on the previous step's output (agentic loop). Re-invocations are bounded by a maximum step count and a maximum wall-clock time to prevent unbounded execution. Each API call is executed with the calling user's scoped credentials — not system-level credentials.
Security Scoping. This is the highest-risk component of the pattern. LLM-initiated API calls must not exceed the calling user's permission boundary. The execution harness enforces this by binding each API call to a per-user OAuth token or scoped API key obtained at session start. The LLM cannot request credentials, cannot escalate scope, and cannot call APIs not in the filtered catalogue. A shadow-mode capability allows API calls to be simulated without execution — useful for testing and for high-stakes operation confirmation flows.
Result Synthesis. On plan completion, the LLM synthesises the collected API responses into a natural language result (summary, table, or structured output per the user's request). Raw API response data is not surfaced directly to users — the synthesis step transforms it into human-consumable form while the raw structured data is available in the audit log for verification.
5. Architecture Diagram
flowchart TD
subgraph Input["Input Layer"]
A[User Natural Language Request]
B[Intent Classifier]
C[Filtered API Catalogue]
end
subgraph Planning["Planning Layer"]
D[LLM Function Calling]
E[Parameter Validator]
end
subgraph Execution["Execution Layer"]
F[Execution Harness]
G[API Gateway]
H[(Audit Logger)]
end
A --> B
B --> C
C --> D
D -->|execution plan| E
E -->|validation fail| D
E -->|valid plan| F
F --> G
G -->|API response| F
F --> H
F -->|synthesised result| A
style A fill:#dbeafe,stroke:#3b82f6
style B fill:#f0fdf4,stroke:#22c55e
style C fill:#fef9c3,stroke:#eab308
style D fill:#f0fdf4,stroke:#22c55e
style E fill:#f3e8ff,stroke:#a855f7
style F fill:#f0fdf4,stroke:#22c55e
style G fill:#f0fdf4,stroke:#22c55e
style H fill:#fef9c3,stroke:#eab308
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Composition Engine |
Service |
Orchestrate the end-to-end flow: intent classification, catalogue filtering, LLM invocation, plan execution, synthesis |
Python FastAPI, Node.js, LangChain, LlamaIndex Agents |
Critical |
| Intent Classifier |
Service / Model |
Classify user request intent to narrow API catalogue before LLM invocation |
GPT-4o-mini, Claude Haiku, fine-tuned classifier |
High |
| API Catalogue Compiler |
Service |
Translate OpenAPI 3.x specs to LLM tool definitions; filter by user permission at request time |
Custom Python service, Speakeasy, custom APIM policy |
Critical |
| LLM (Function Calling) |
AI Service |
Generate execution plan from natural language request and filtered tool catalogue |
GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro |
Critical |
| Parameter Validator |
Library |
Validate LLM-extracted parameters against OpenAPI schema before execution |
Pydantic, OpenAPI-core, AJV (JSON Schema) |
Critical |
| Execution Harness |
Service |
Execute API calls with user-scoped credentials; enforce step count and time limits |
Custom service with per-user token injection |
Critical |
| API Gateway |
Infrastructure |
Enforce per-user authentication and authorisation on all API calls |
Kong, Azure APIM, AWS API Gateway, Apigee |
Critical |
| Audit Logger |
Service |
Log every plan, every step, every API call, and every synthesis event |
Structured logging → SIEM, OpenTelemetry traces |
Critical |
| Session Manager |
Service |
Manage per-session state: conversation history, in-progress plans, user credentials for session duration |
Redis, Upstash, DynamoDB sessions |
High |
| Shadow Mode Executor |
Service |
Simulate API call plan without executing — for testing and high-stakes confirmation |
Mock API layer with response fixtures |
Medium |
7. Data Flow
Primary Flow
| Step |
Actor |
Action |
Output |
| 1 |
User |
Submits natural language request via chat or API interface |
Request received by Composition Engine |
| 2 |
Intent Classifier |
Classifies request against API domain taxonomy |
Intent domain list (e.g., ["CRM", "Finance"]) |
| 3 |
API Catalogue |
Filters full catalogue to user-permitted APIs in identified domains |
Filtered tool list (typically 5–25 tools) |
| 4 |
LLM |
Receives request + user context + filtered tools; generates execution plan |
Ordered list of tool calls with extracted parameters |
| 5 |
Parameter Validator |
Validates each tool call's parameters against OpenAPI schema |
Validation pass or specific error per parameter |
| 6 |
Execution Harness |
Executes each step in order using user-scoped credentials via API gateway |
Step result (API response) per tool call |
| 7 |
LLM |
Receives collected step results; synthesises into natural language response |
Synthesised natural language result |
| 8 |
User |
Receives synthesised response |
Task completed |
| 9 |
Audit Logger |
Logs: request, plan, each step (tool, params, response, latency), synthesis, total cost |
Immutable audit record |
Error Flow
| Step |
Error Condition |
Detection |
Recovery |
| 4 |
LLM generates plan calling APIs outside filtered catalogue |
Plan validation — execution harness rejects tools not in approved list |
Return error to LLM for replan with catalogue constraint reinforced |
| 5 |
Parameter validation fails |
Schema validation returns specific error |
Return error to LLM with validation message; LLM re-plans with corrected parameters; max 3 re-plan attempts |
| 6 |
API call returns 4xx error |
HTTP error response from API gateway |
Return error to LLM; LLM may retry, use alternative API, or escalate to user |
| 6 |
API call returns 5xx or times out |
HTTP 5xx or connection timeout |
Retry with backoff (max 2 retries); if persistent, abort plan and report partial completion to user |
| 6 |
Step count limit reached |
Execution harness step counter exceeds limit |
Abort plan; return partial results with clear indication of what was and was not completed |
| 4–6 |
LLM generates plan involving privilege-escalating API sequence |
Execution harness rejects call — user token lacks required scope |
Return permission error to user; log potential privilege escalation attempt in security audit log |
8. Security Considerations
Authentication and Authorisation
- Every API call in the execution plan is made using the calling user's OAuth token or scoped API key — obtained at session start via the API gateway's token exchange.
- The execution harness does not hold system-level credentials for downstream APIs. It cannot execute calls that exceed the user's delegated authority.
- The LLM itself has no credentials and no network access — it is a planning function. All execution goes through the harness.
- The API catalogue exposed to the LLM is filtered to the user's permission set before LLM invocation — the LLM cannot even see APIs the user is not permitted to call.
Secrets Management
- LLM API keys stored in centralised secrets manager; injected into Composition Engine at startup.
- Per-user tokens obtained via OAuth 2.0 authorisation code flow at session start; stored in encrypted session state; revoked on session end.
- Never include API keys or tokens in LLM context — they are injected by the execution harness at call time, not passed to the LLM.
Data Classification
- User request text may contain sensitive data; requests are logged with data classification determined at session start.
- LLM-synthesised results may contain personal data drawn from API responses; output classification inherits the highest classification of any API response in the execution chain.
- Results are not retained beyond the session unless explicitly stored by the user.
Encryption
- All API calls in transit encrypted (TLS 1.3).
- Session state encrypted at rest in session manager.
- Audit log encrypted at rest; access restricted to security and compliance roles.
Auditability
- Every LLM-initiated API call is logged with: user identity, session ID, plan step number, API called, parameters (sanitised), response status, latency, user-scoped token scope used.
- Audit trail enables reconstruction of exactly what APIs were called, with what parameters, on whose behalf, as a result of what user request — critical for incident investigation and regulatory response.
OWASP LLM Top 10 Mitigations
| OWASP LLM Risk |
Relevance |
Mitigation in This Pattern |
| LLM01 — Prompt Injection |
Very High |
User input sanitised before inclusion in LLM context; indirect prompt injection via API response content isolated by structured response parsing, not direct string interpolation into LLM prompt |
| LLM02 — Insecure Output Handling |
High |
API call parameters extracted from LLM output are validated against OpenAPI schema before execution; LLM output not executed as code |
| LLM03 — Training Data Poisoning |
Low |
LLM is a consumed API service; training data is provider responsibility; fine-tuning not used in this pattern |
| LLM04 — Model Denial of Service |
Medium |
Request rate limiting per user; maximum plan step count and wall-clock time limits; maximum token budget per request |
| LLM05 — Supply Chain Vulnerabilities |
Medium |
LLM provider contracts include data handling SLA; SDK versions pinned; SBOM per composition engine release |
| LLM06 — Sensitive Information Disclosure |
Very High |
API responses containing PII not included verbatim in subsequent LLM invocations — summarised or keyed references only; session logs access-controlled |
| LLM07 — Insecure Plugin Design |
Very High |
Tool catalogue filtered to user permissions; execution harness enforces permission boundary; LLM cannot request credentials; tool definitions include explicit scope restrictions |
| LLM08 — Excessive Agency |
Very High |
Step count limits; wall-clock time limits; high-stakes operations require explicit user confirmation before execution; LLM cannot initiate write operations without user instruction in session |
| LLM09 — Overreliance |
High |
Synthesis responses include confidence indicators; users informed when LLM plan involved incomplete data; partial completion is clearly surfaced |
| LLM10 — Model Theft |
Low |
LLM accessed via provider API; no model weights in enterprise custody; provider contract governs usage |
9. Governance Considerations
Responsible AI
- LLM orchestration must not produce discriminatory API parameter selections (e.g., filtering customer cohorts by protected attributes without explicit user instruction). API catalogue definitions include field-level sensitivity annotations to constrain LLM parameter choices.
- Users must be able to review the execution plan before it runs for write-heavy operations — confirmation step is a governance control, not just a UX feature.
- LLM synthesis responses should include provenance: "This result was compiled from data returned by the CRM API and the Finance API as of [timestamp]."
Model Risk Management
- The LLM's plan generation is a model risk subject if its output influences financial, employment, or credit decisions.
- Model version tracked in every audit log record; enables retrospective analysis of plan quality by LLM version.
- Plan success rate, re-plan rate, and user correction rate are ongoing performance metrics that feed the model risk monitoring programme.
Human Approval Gates
- Write operations (create, update, delete across any API) require explicit user confirmation before execution harness proceeds.
- Bulk operations (> configurable threshold of records affected) always require human approval regardless of operation type.
- Multi-system write operations (affecting > 1 backend API) require confirmation.
Policy and Traceability
- AI API composition policy defines: approved API catalogue scope, maximum plan complexity (step count), prohibited API combinations (e.g., cannot compose data export + communication APIs without explicit approval), and data retention for execution logs.
- Full traceability: natural language request → LLM plan → each API call → each API response → synthesised result — all linked by session ID and plan ID.
Governance Artefacts
| Artefact |
Owner |
Update Frequency |
Storage Location |
| API Catalogue (OpenAPI definitions + LLM tool descriptions) |
API Platform Team |
Per API change |
API catalogue repository (Git-backed) |
| AI Composition Policy |
CISO + Chief AI Officer |
Quarterly |
Policy management system |
| Execution Audit Logs |
Platform Engineering |
Continuous |
Immutable audit log store (7-year retention for regulated use cases) |
| Model Risk Assessment for LLM Planner |
Model Risk Team |
Per LLM version upgrade |
MRM register |
| Prohibited API Combination Register |
Security Architecture |
Per risk assessment |
Security architecture repository |
| User Confirmation Threshold Configuration |
Product + Risk |
Per use case |
Feature flag / configuration management |
10. Operational Considerations
Monitoring and SLOs
| SLO |
Target |
Measurement |
Alert Threshold |
| Request completion rate |
> 95% |
Completed requests / total requests |
< 90% over 1 hour |
| Plan generation latency (p99) |
< 5s |
Time from request to first API call |
> 10s |
| End-to-end execution latency (p99) |
< 30s |
Time from request to synthesised result |
> 60s |
| Replan rate (schema validation failures) |
< 5% |
Replan events / total plan generation events |
> 15% sustained |
| API call error rate (from AI-initiated calls) |
< 2% |
API 4xx/5xx / total AI-initiated API calls |
> 5% sustained |
| Privilege escalation attempt rate |
0 |
Security audit log entries for blocked escalation |
Any occurrence — immediate alert |
Logging
- Every session: session ID, user identity, session start/end time, total cost (LLM + API calls where available).
- Every request: request text (subject to data classification masking), intent classification, filtered catalogue size.
- Every plan: plan ID, LLM model version, tools selected, plan generation latency, token usage.
- Every step: step number, tool name, parameters (sanitised), API response status, latency, retry count.
- Every synthesis: synthesis latency, token usage, output length.
Incident Response
- Privilege escalation attempt detected: immediate alert; session suspended; security review of audit log; block user pending investigation if repeated.
- LLM provider outage: Composition Engine returns "AI service temporarily unavailable" — no automated fallback to less capable model (plan quality cliff is a safety concern); users directed to manual process.
- API gateway outage: all API calls fail; plan execution halts with partial completion message; session state preserved for resume on recovery.
Disaster Recovery
| Scenario |
RTO |
RPO |
Recovery Procedure |
| Composition Engine failure |
3 minutes |
0 (stateless service; session state in Redis) |
Kubernetes restart; users re-submit requests |
| Session Manager (Redis) failure |
5 minutes |
Up to 5 minutes of session state |
Redis Sentinel failover; active sessions may need re-authentication |
| LLM provider outage |
N/A (dependency) |
N/A |
Fail-fast; user communication; no automated fallback |
| API Catalogue service failure |
10 minutes |
0 |
Restart; cached catalogue snapshot serves recent requests during brief outage |
Capacity Planning
- LLM token consumption: (average prompt tokens per request) × (1 + replan rate) × (average steps per plan) × (requests per day) = daily token budget.
- Composition Engine compute: scales with concurrent request count × average execution duration; typical 2–5 concurrent sessions per vCPU.
- Session Manager: size for (concurrent sessions) × (average session state size) × (session duration); typically low memory footprint.
11. Cost Considerations
Cost Drivers
| Cost Driver |
Description |
Typical Proportion |
| LLM API (plan generation + synthesis) |
Token-based; complex plans with large tool catalogues can consume 5,000–20,000 tokens per request |
50–70% |
| Downstream API Costs |
Charges from internal APIs or external SaaS APIs called as part of the plan |
10–25% |
| Composition Engine Compute |
Container runtime; relatively low — I/O-bound, not compute-bound |
5–10% |
| Session Manager |
Redis or equivalent; low cost |
1–3% |
| Audit Log Storage |
High write volume; 7-year retention for regulated use cases |
5–10% |
| LLM Re-plan Calls |
Additional token cost from validation failures triggering re-planning |
5–15% |
Scaling Risks
- Token costs are highly sensitive to tool catalogue size — a 200-tool catalogue sent in every request generates 10–20× more input tokens than a 20-tool filtered view. Intent-based filtering is critical for cost control.
- Replan rate amplifies LLM costs non-linearly; poor API definitions increase replan rate; investing in high-quality tool descriptions pays immediate ROI.
- Agentic loops (multi-step re-invocations) can consume unbounded tokens if step count limits are not enforced.
Cost Optimisations
- Intent-based catalogue filtering: reduce tool catalogue size per request from 200 to 20 tools — 10× LLM input token reduction.
- Use a cheap classifier model for intent classification; reserve expensive model for plan generation only.
- Response caching: identical or near-identical requests within a session window serve cached results.
- Shadow mode for testing: prevent test traffic from incurring downstream API costs.
Indicative Cost Range
| Scale |
LLM API Monthly |
Infrastructure Monthly |
Total Monthly |
| Small (1,000 requests/day, 10 steps avg) |
$2,000–$8,000 |
$500–$2,000 |
$2,500–$10,000 |
| Medium (10,000 requests/day, 10 steps avg) |
$15,000–$60,000 |
$3,000–$8,000 |
$18,000–$68,000 |
| Large (100,000 requests/day, 10 steps avg) |
$120,000–$500,000 |
$20,000–$50,000 |
$140,000–$550,000 |
12. Trade-Off Analysis
Architectural Options Comparison
| Option |
Flexibility |
Predictability |
Latency |
Security Risk |
Cost |
Recommended For |
| Option A — LLM-based API Composition (this pattern) |
Very High |
Medium |
5–30s |
High (requires rigorous scoping) |
High |
Complex, varied user intents; large API catalogue |
| Option B — Static Workflow Orchestration |
Low |
Very High |
< 1s |
Low |
Low |
Known, fixed integration patterns |
| Option C — RPA / Macro Recording |
Medium |
High |
Variable |
Medium |
Medium |
UI-level automation of legacy systems |
| Option D — No-Code Integration Platform |
Medium |
High |
Variable |
Low-Medium |
Medium |
Business user self-service with defined action vocabulary |
Architectural Tensions
| Tension |
Trade-Off |
Resolution |
| Plan flexibility vs. Security boundary |
More flexible LLM planning creates larger attack surface for privilege escalation |
Execution harness as hard permission boundary; LLM plan generation is advisory, never authoritative |
| Catalogue size vs. Plan quality vs. Cost |
Large catalogue = better coverage; also = more tokens, more noise, lower plan quality |
Intent-based filtering: give the LLM only the tools relevant to the detected intent |
| Agentic depth vs. Execution predictability |
Deeper agentic loops produce better results for complex requests; also harder to audit and bound |
Maximum step count limit; maximum wall-clock limit; intermediate result checkpointing for user review |
13. Failure Modes
| Failure |
Likelihood |
Impact |
Detection |
Recovery |
| LLM generates plan exceeding user permissions |
Medium |
High — attempted unauthorised data access |
Execution harness rejects call; security audit log entry |
Plan rejected; user receives permission error; repeated attempts trigger security review |
| LLM generates incorrect parameter extraction |
High |
Medium — API call fails or returns wrong data |
Parameter validation catches schema violations; LLM re-plans |
Max 3 re-plan attempts; escalate to user if unresolved |
| Indirect prompt injection via API response content |
Low |
High — LLM behaviour manipulated via malicious data in API response |
Structured response parsing; no free-text API response injected into LLM prompt |
Detection via unexpected tool call patterns in audit log; session termination on detection |
| Step count limit exceeded on complex request |
Medium |
Low — partial completion |
Execution harness step counter |
Return partial results with clear incomplete status to user |
| LLM provider rate limit during multi-step plan |
Medium |
Medium — plan execution interrupted |
HTTP 429 from LLM provider |
Retry with backoff; if plan cannot complete, save checkpoint and resume on retry |
| Stale API catalogue definition |
Medium |
Medium — LLM generates calls with incorrect parameters |
High replan rate as validation fails repeatedly |
API catalogue freshness monitoring; alert when OpenAPI spec changes without catalogue update |
Cascading Failure Scenarios
- Prompt injection in API response + no structured parsing: Malicious data in CRM API response reprograms LLM to call data export API with elevated scope → execution harness blocks the call (permission boundary) but attacker learns API catalogue from error responses. Mitigation: treat all API response content as untrusted; parse structured fields by name, never inject raw response into LLM context.
- High replan rate + no cost circuit breaker: Poorly-written API definition causes 100% replan rate on a class of requests → LLM costs spike 15× → monthly budget exceeded → service suspended. Mitigation: per-session and per-user daily token budget limits with hard stop.
14. Regulatory Considerations
APRA CPS 230 — Operational Risk
- Clause 36: Agentic AI orchestration of enterprise APIs is a novel operational risk; the enterprise must document the control framework (permission boundaries, step limits, audit logging) in the operational risk framework.
- Clause 49: LLM providers whose models orchestrate enterprise API calls are material service providers under CPS 230 third-party risk requirements.
APRA CPS 234 — Information Security
- Clause 15: Permission boundary enforcement by the execution harness, tool catalogue access control, and audit logging are the primary information security controls for this pattern.
- Clause 36 (Incident Notification): Privilege escalation attempts detected by the execution harness must be assessed as potential security incidents under CPS 234 notification obligations.
Australian Privacy Act 1988 (as amended)
- APP 6: LLM orchestration accessing personal data APIs must be within the scope of the use purpose for which data was collected; general-purpose "tell me anything about this customer" compositions may not satisfy APP 6.
- APP 11: Session logs containing personal data drawn from API responses are subject to security and retention obligations; default session log retention should be minimised.
EU AI Act (2024)
- Article 6 (High-Risk AI): If this pattern is used to automate or support decisions in high-risk categories (credit, employment, access to essential services), it is a high-risk AI system requiring conformity assessment.
- Article 14 (Human Oversight): Human approval gates for write operations and bulk operations directly implement the human oversight requirement.
- Article 12 (Record-keeping): Execution audit logs satisfy the logging requirement; retention periods must align with Art. 12(1)(a) for high-risk systems.
ISO 42001 — AI Management System
- Clause 8.4 (AI System Impact Assessment): Impact assessment required before deploying LLM orchestration in any regulated business process domain.
NIST AI RMF (2023)
- GOVERN 6.1: Roles and responsibilities for LLM-initiated API calls must be clearly assigned — who is accountable when an LLM-orchestrated action causes harm?
- MAP 5.1: LLM API composition in regulated domains is a high-risk deployment; risk treatment must include the human oversight and permission boundary controls described in this pattern.
15. Reference Implementations
AWS
- Composition Engine: AWS Lambda (function per request) or ECS Fargate
- LLM: Amazon Bedrock (Claude 3.5 Sonnet with tool use) or OpenAI API via direct call
- API Catalogue: AWS API Gateway with OpenAPI export; custom catalogue compiler Lambda
- Parameter Validation: Pydantic in Lambda runtime
- Execution Harness: Lambda with AWS STS AssumeRole for per-user credential scoping
- Session Manager: Amazon ElastiCache (Redis OSS)
- Audit Logger: CloudWatch Logs + S3 for long-term retention; Athena for query
Azure
- Composition Engine: Azure Functions (Flex Consumption) or Azure Container Apps
- LLM: Azure OpenAI Service (GPT-4o with function calling)
- API Catalogue: Azure API Management with OpenAPI export; custom compiler Function
- Parameter Validation: Pydantic or Azure API Management built-in schema validation
- Execution Harness: Function with Azure Managed Identity + per-user delegated access tokens
- Session Manager: Azure Cache for Redis
- Audit Logger: Application Insights + Azure Monitor + Log Analytics
GCP
- Composition Engine: Cloud Run (request-scoped scaling)
- LLM: Vertex AI (Gemini 1.5 Pro with function calling) or Anthropic Claude via Vertex AI
- API Catalogue: Apigee API Hub with OpenAPI export; custom compiler Cloud Run service
- Parameter Validation: Pydantic in Python runtime
- Execution Harness: Cloud Run with Workload Identity and per-user token exchange
- Session Manager: Memorystore (Redis)
- Audit Logger: Cloud Logging → BigQuery for compliance queries
On-Premises / Private Cloud
- Composition Engine: Python FastAPI on Kubernetes
- LLM: vLLM or Ollama serving Llama 3.1 70B with function calling, or self-hosted mistral-7B
- API Catalogue: Kong API Gateway with OpenAPI export; custom Python catalogue compiler
- Parameter Validation: Pydantic
- Execution Harness: Custom Python service with per-user credential injection from HashiCorp Vault
- Session Manager: Redis on Kubernetes via Bitnami Helm chart
- Audit Logger: Fluentd → Elasticsearch → Kibana; long-term archive to MinIO
| Pattern |
Relationship |
Notes |
| EAAPL-INT001 — Enterprise AI Service Bus |
Complementary |
Composition Engine publishes execution events to the AI Service Bus for enterprise-wide cost and audit visibility |
| EAAPL-INT002 — Legacy System AI Augmentation |
Complementary |
Legacy API adapters expose legacy systems as API catalogue entries for LLM composition |
| EAAPL-INT007 — AI Circuit Breaker |
Enables |
Circuit breaker wraps LLM provider calls and downstream API calls within the execution harness |
| EAAPL-INT008 — Bidirectional AI Sync |
Related |
Composition results may trigger sync events to update enterprise data stores |
17. Maturity Assessment
Overall Maturity: Proven
| Dimension |
Score (1–5) |
Justification |
| Architectural Completeness |
5 |
All stages — intent, catalogue, plan, validation, execution, synthesis — fully specified |
| Operational Readiness |
4 |
SLOs and monitoring defined; LLM provider dependency creates inherent availability limit |
| Security Coverage |
5 |
Permission boundary, prompt injection, privilege escalation, OWASP LLM Top 10 all addressed |
| Governance Coverage |
5 |
Human approval gates, audit trail, model risk, policy all included |
| Cost Predictability |
3 |
Token costs variable; agentic depth variability is inherent; budget controls required |
| Implementation Complexity |
2 |
High complexity — requires mature API catalogue, permission model, and LLM engineering capability |
| Industry Validation |
4 |
Deployed in production at financial services and professional services firms; healthcare implementations emerging |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2026-06-12 |
EAAPL Working Group |
Initial publication — integration patterns series |