Proven

EAAPL-INT003 — AI-Powered API Composition

Tags: agent function-calling llm high-complexity Status: Proven | Version: 1.0 | Domain: Integration

1. Executive Summary

AI-Powered API Composition uses large language models with function-calling capability to dynamically orchestrate sequences of API calls in response to natural language requests. Rather than hard-coding integration workflows, an LLM reads a structured catalogue of available APIs, selects appropriate endpoints, extracts parameters from natural language, constructs valid API calls, interprets responses, and chains multi-step operations to fulfil complex user intents.

This pattern dissolves a long-standing bottleneck in enterprise integration: the requirement for developers to pre-define every possible integration workflow. Natural language input replaces workflow configuration. Enterprises with dozens of internal APIs can expose them as a unified intelligent interface without building bespoke orchestration logic for each combination.

The security and governance challenges are significant. An LLM orchestrating real API calls on behalf of users must operate strictly within the caller's permission boundary, must never escalate privileges through creative API chaining, and must produce auditable records of every API call it initiates. For CIOs and CTOs, the pattern's value proposition is the elimination of integration backlog: business users can compose workflows that previously required weeks of developer time, within the constraints of the enterprise API catalogue and the user's existing access rights.

2. Problem Statement

Business Problem

Enterprise integration backlogs are structural. Business users need to combine data from multiple systems — CRM, ERP, HRIS, analytics — to complete a single task. Each combination requires a developer to build a bespoke integration. The backlog of requested integrations grows faster than delivery capacity. Meanwhile, users work around the gap by manually copying data between systems, introducing errors and latency.

Technical Problem

API catalogues in large enterprises contain hundreds of endpoints across dozens of systems. Building and maintaining static orchestration logic for every valid API composition is neither scalable nor sustainable. Schema changes in any component API can break dependent compositions, requiring coordinated fixes across all workflows that reference that API.

Symptoms

Business analysts run manual multi-step copy-paste workflows between systems to produce outputs that could be automated.
Integration backlog time-to-delivery exceeds 6 months for moderate-complexity compositions.
API usage is heavily skewed to a small number of heavily-used endpoints; the long tail of potentially valuable APIs is never consumed because no one has built the integration.
Every organisational restructure triggers a large wave of integration rebuild work because workflows are coded to specific system combinations.

Cost of Inaction

Productivity: A knowledge worker spending 2 hours per day on manual cross-system data tasks represents $25,000–$50,000 per year in recoverable productivity per person at knowledge worker cost rates.
Quality: Manual data transfer introduces error rates that automated API composition eliminates — typically 1–5% error rate in manual transcription.
Velocity: Six-month integration delivery cycles mean business opportunities cannot be acted on at the speed of need.
Talent: Senior developers spend 30–40% of time on integration plumbing that could be automated.

3. Context

When to Apply

The enterprise has a curated internal API catalogue with machine-readable definitions (OpenAPI 3.x preferred).
User requests are sufficiently complex and varied that pre-defining every workflow is impractical.
A natural language interface is an appropriate UX pattern for the target users (knowledge workers, internal staff).
The enterprise API catalogue exposes read operations and bounded write operations within clear permission scopes.

When NOT to Apply

Simple, fixed workflows where a traditional integration approach is more predictable and auditable.
High-frequency, low-latency automation (sub-second SLA) — LLM orchestration adds 1–10 seconds of planning latency.
Workflows touching systems with no permission model — AI-initiated calls cannot be safely scoped.
Public-facing user interactions where LLM API call errors could expose internal system structure to external users.
Organisations without a managed API catalogue — deploying this pattern without API governance creates uncontrolled lateral movement risk.

Prerequisites

Curated API catalogue with OpenAPI 3.x definitions for all APIs to be exposed to the LLM.
API gateway enforcing per-user, per-scope permission model.
Centralised secrets management; per-user OAuth tokens or API keys for downstream API calls.
Observability platform capable of correlating LLM planning decisions with downstream API calls.
LLM with reliable function-calling capability (tool use specification).

Industry Applicability

Industry	Applicability	Use Case Examples	Risk Level
Financial Services	High	"Show me all overdue invoices for clients in arrears over 90 days and draft a collection letter for the top 10" — composes AR + CRM + document generation APIs	High — must enforce financial data access controls
Professional Services	High	"Get all timesheets for project X this month, calculate burn rate, and update the project dashboard" — composes HRIS + project management + BI APIs	Medium
Government	Medium	"Find all open permits in postcode 3000 and check for safety inspection overdue" — composes spatial + licensing + inspection APIs	High — privacy and access control critical
Healthcare	Medium	"List patients due for follow-up after discharge and prepare recall notices" — composes EMR + scheduling + communication APIs	Very High — PHI controls mandatory
Retail / eCommerce	Medium	"Find all high-value customers with recent cart abandonment and prepare personalised offer codes" — composes CRM + cart + promotions APIs	Low
Manufacturing	Low	Complex operational workflows are better served by deterministic RPA than probabilistic LLM orchestration	Low

4. Architecture Overview

AI-Powered API Composition is a request-time orchestration architecture with six logical stages: intent capture, API catalogue resolution, plan generation, parameter extraction and validation, plan execution, and result synthesis.

API Catalogue as LLM Context. The foundation of the pattern is a well-curated API catalogue presented to the LLM as tool definitions. Each API endpoint is described as a function: function name, description (written for LLM comprehension, not developer reference), parameter names and types, parameter descriptions, and example values. The OpenAPI specification is the source of truth; a catalogue compiler translates OpenAPI definitions into LLM tool format. The catalogue is not static — it is filtered at request time to expose only the APIs for which the calling user has permission. An LLM receiving a 300-tool catalogue performs significantly worse than one receiving a 20-tool filtered view; relevance filtering by intent category improves plan quality by 40–60% in empirical testing.

Intent Analysis and Catalogue Scoping. On receiving a natural language request, the composition engine first classifies the request intent against the API domain taxonomy. This narrows the tool set from the full catalogue to the relevant domain subset before the LLM sees it. Intent classification can be performed by a smaller, faster model to reduce planning latency.

Plan Generation. The LLM receives the filtered tool catalogue, the user's request, the user's identity context (name, role, permission scopes), and the conversation history. It produces a structured execution plan: an ordered list of tool calls with parameter values derived from the request. The plan is generated before any API call is made. The planning step is inspectable — the plan is logged and can be presented to the user for confirmation on high-stakes operations.

Parameter Extraction and Validation. LLM-extracted parameters are validated against the API schema before execution. This validation catches type errors (string where integer expected), range violations, and missing required parameters. Validation failures are returned to the LLM with the specific error, enabling re-planning. The LLM does not have direct access to execute API calls; all execution goes through the validated execution harness.

Dynamic API Chain Execution. The execution harness runs the plan step by step. Each step's output is available to subsequent steps as context. The LLM may be re-invoked between steps in complex chains where the next API selection depends on the previous step's output (agentic loop). Re-invocations are bounded by a maximum step count and a maximum wall-clock time to prevent unbounded execution. Each API call is executed with the calling user's scoped credentials — not system-level credentials.

Security Scoping. This is the highest-risk component of the pattern. LLM-initiated API calls must not exceed the calling user's permission boundary. The execution harness enforces this by binding each API call to a per-user OAuth token or scoped API key obtained at session start. The LLM cannot request credentials, cannot escalate scope, and cannot call APIs not in the filtered catalogue. A shadow-mode capability allows API calls to be simulated without execution — useful for testing and for high-stakes operation confirmation flows.

Result Synthesis. On plan completion, the LLM synthesises the collected API responses into a natural language result (summary, table, or structured output per the user's request). Raw API response data is not surfaced directly to users — the synthesis step transforms it into human-consumable form while the raw structured data is available in the audit log for verification.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Input["Input Layer"] A[User Natural Language Request] B[Intent Classifier] C[Filtered API Catalogue] end subgraph Planning["Planning Layer"] D[LLM Function Calling] E[Parameter Validator] end subgraph Execution["Execution Layer"] F[Execution Harness] G[API Gateway] H[(Audit Logger)] end A --> B B --> C C --> D D -->|execution plan| E E -->|validation fail| D E -->|valid plan| F F --> G G -->|API response| F F --> H F -->|synthesised result| A style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#fef9c3,stroke:#eab308 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#fef9c3,stroke:#eab308

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Composition Engine	Service	Orchestrate the end-to-end flow: intent classification, catalogue filtering, LLM invocation, plan execution, synthesis	Python FastAPI, Node.js, LangChain, LlamaIndex Agents	Critical
Intent Classifier	Service / Model	Classify user request intent to narrow API catalogue before LLM invocation	GPT-4o-mini, Claude Haiku, fine-tuned classifier	High
API Catalogue Compiler	Service	Translate OpenAPI 3.x specs to LLM tool definitions; filter by user permission at request time	Custom Python service, Speakeasy, custom APIM policy	Critical
LLM (Function Calling)	AI Service	Generate execution plan from natural language request and filtered tool catalogue	GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro	Critical
Parameter Validator	Library	Validate LLM-extracted parameters against OpenAPI schema before execution	Pydantic, OpenAPI-core, AJV (JSON Schema)	Critical
Execution Harness	Service	Execute API calls with user-scoped credentials; enforce step count and time limits	Custom service with per-user token injection	Critical
API Gateway	Infrastructure	Enforce per-user authentication and authorisation on all API calls	Kong, Azure APIM, AWS API Gateway, Apigee	Critical
Audit Logger	Service	Log every plan, every step, every API call, and every synthesis event	Structured logging → SIEM, OpenTelemetry traces	Critical
Session Manager	Service	Manage per-session state: conversation history, in-progress plans, user credentials for session duration	Redis, Upstash, DynamoDB sessions	High
Shadow Mode Executor	Service	Simulate API call plan without executing — for testing and high-stakes confirmation	Mock API layer with response fixtures	Medium

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	User	Submits natural language request via chat or API interface	Request received by Composition Engine
2	Intent Classifier	Classifies request against API domain taxonomy	Intent domain list (e.g., ["CRM", "Finance"])
3	API Catalogue	Filters full catalogue to user-permitted APIs in identified domains	Filtered tool list (typically 5–25 tools)
4	LLM	Receives request + user context + filtered tools; generates execution plan	Ordered list of tool calls with extracted parameters
5	Parameter Validator	Validates each tool call's parameters against OpenAPI schema	Validation pass or specific error per parameter
6	Execution Harness	Executes each step in order using user-scoped credentials via API gateway	Step result (API response) per tool call
7	LLM	Receives collected step results; synthesises into natural language response	Synthesised natural language result
8	User	Receives synthesised response	Task completed
9	Audit Logger	Logs: request, plan, each step (tool, params, response, latency), synthesis, total cost	Immutable audit record

Error Flow

Step	Error Condition	Detection	Recovery
4	LLM generates plan calling APIs outside filtered catalogue	Plan validation — execution harness rejects tools not in approved list	Return error to LLM for replan with catalogue constraint reinforced
5	Parameter validation fails	Schema validation returns specific error	Return error to LLM with validation message; LLM re-plans with corrected parameters; max 3 re-plan attempts
6	API call returns 4xx error	HTTP error response from API gateway	Return error to LLM; LLM may retry, use alternative API, or escalate to user
6	API call returns 5xx or times out	HTTP 5xx or connection timeout	Retry with backoff (max 2 retries); if persistent, abort plan and report partial completion to user
6	Step count limit reached	Execution harness step counter exceeds limit	Abort plan; return partial results with clear indication of what was and was not completed
4–6	LLM generates plan involving privilege-escalating API sequence	Execution harness rejects call — user token lacks required scope	Return permission error to user; log potential privilege escalation attempt in security audit log

8. Security Considerations

Authentication and Authorisation

Every API call in the execution plan is made using the calling user's OAuth token or scoped API key — obtained at session start via the API gateway's token exchange.
The execution harness does not hold system-level credentials for downstream APIs. It cannot execute calls that exceed the user's delegated authority.
The LLM itself has no credentials and no network access — it is a planning function. All execution goes through the harness.
The API catalogue exposed to the LLM is filtered to the user's permission set before LLM invocation — the LLM cannot even see APIs the user is not permitted to call.

Secrets Management

LLM API keys stored in centralised secrets manager; injected into Composition Engine at startup.
Per-user tokens obtained via OAuth 2.0 authorisation code flow at session start; stored in encrypted session state; revoked on session end.
Never include API keys or tokens in LLM context — they are injected by the execution harness at call time, not passed to the LLM.

Data Classification

User request text may contain sensitive data; requests are logged with data classification determined at session start.
LLM-synthesised results may contain personal data drawn from API responses; output classification inherits the highest classification of any API response in the execution chain.
Results are not retained beyond the session unless explicitly stored by the user.

Encryption

All API calls in transit encrypted (TLS 1.3).
Session state encrypted at rest in session manager.
Audit log encrypted at rest; access restricted to security and compliance roles.

Auditability

Every LLM-initiated API call is logged with: user identity, session ID, plan step number, API called, parameters (sanitised), response status, latency, user-scoped token scope used.
Audit trail enables reconstruction of exactly what APIs were called, with what parameters, on whose behalf, as a result of what user request — critical for incident investigation and regulatory response.

OWASP LLM Top 10 Mitigations

OWASP LLM Risk	Relevance	Mitigation in This Pattern
LLM01 — Prompt Injection	Very High	User input sanitised before inclusion in LLM context; indirect prompt injection via API response content isolated by structured response parsing, not direct string interpolation into LLM prompt
LLM02 — Insecure Output Handling	High	API call parameters extracted from LLM output are validated against OpenAPI schema before execution; LLM output not executed as code
LLM03 — Training Data Poisoning	Low	LLM is a consumed API service; training data is provider responsibility; fine-tuning not used in this pattern
LLM04 — Model Denial of Service	Medium	Request rate limiting per user; maximum plan step count and wall-clock time limits; maximum token budget per request
LLM05 — Supply Chain Vulnerabilities	Medium	LLM provider contracts include data handling SLA; SDK versions pinned; SBOM per composition engine release
LLM06 — Sensitive Information Disclosure	Very High	API responses containing PII not included verbatim in subsequent LLM invocations — summarised or keyed references only; session logs access-controlled
LLM07 — Insecure Plugin Design	Very High	Tool catalogue filtered to user permissions; execution harness enforces permission boundary; LLM cannot request credentials; tool definitions include explicit scope restrictions
LLM08 — Excessive Agency	Very High	Step count limits; wall-clock time limits; high-stakes operations require explicit user confirmation before execution; LLM cannot initiate write operations without user instruction in session
LLM09 — Overreliance	High	Synthesis responses include confidence indicators; users informed when LLM plan involved incomplete data; partial completion is clearly surfaced
LLM10 — Model Theft	Low	LLM accessed via provider API; no model weights in enterprise custody; provider contract governs usage

9. Governance Considerations

Responsible AI

LLM orchestration must not produce discriminatory API parameter selections (e.g., filtering customer cohorts by protected attributes without explicit user instruction). API catalogue definitions include field-level sensitivity annotations to constrain LLM parameter choices.
Users must be able to review the execution plan before it runs for write-heavy operations — confirmation step is a governance control, not just a UX feature.
LLM synthesis responses should include provenance: "This result was compiled from data returned by the CRM API and the Finance API as of [timestamp]."

Model Risk Management

The LLM's plan generation is a model risk subject if its output influences financial, employment, or credit decisions.
Model version tracked in every audit log record; enables retrospective analysis of plan quality by LLM version.
Plan success rate, re-plan rate, and user correction rate are ongoing performance metrics that feed the model risk monitoring programme.

Human Approval Gates

Write operations (create, update, delete across any API) require explicit user confirmation before execution harness proceeds.
Bulk operations (> configurable threshold of records affected) always require human approval regardless of operation type.
Multi-system write operations (affecting > 1 backend API) require confirmation.

Policy and Traceability

AI API composition policy defines: approved API catalogue scope, maximum plan complexity (step count), prohibited API combinations (e.g., cannot compose data export + communication APIs without explicit approval), and data retention for execution logs.
Full traceability: natural language request → LLM plan → each API call → each API response → synthesised result — all linked by session ID and plan ID.

Governance Artefacts

Artefact	Owner	Update Frequency	Storage Location
API Catalogue (OpenAPI definitions + LLM tool descriptions)	API Platform Team	Per API change	API catalogue repository (Git-backed)
AI Composition Policy	CISO + Chief AI Officer	Quarterly	Policy management system
Execution Audit Logs	Platform Engineering	Continuous	Immutable audit log store (7-year retention for regulated use cases)
Model Risk Assessment for LLM Planner	Model Risk Team	Per LLM version upgrade	MRM register
Prohibited API Combination Register	Security Architecture	Per risk assessment	Security architecture repository
User Confirmation Threshold Configuration	Product + Risk	Per use case	Feature flag / configuration management

10. Operational Considerations

Monitoring and SLOs

SLO	Target	Measurement	Alert Threshold
Request completion rate	> 95%	Completed requests / total requests	< 90% over 1 hour
Plan generation latency (p99)	< 5s	Time from request to first API call	> 10s
End-to-end execution latency (p99)	< 30s	Time from request to synthesised result	> 60s
Replan rate (schema validation failures)	< 5%	Replan events / total plan generation events	> 15% sustained
API call error rate (from AI-initiated calls)	< 2%	API 4xx/5xx / total AI-initiated API calls	> 5% sustained
Privilege escalation attempt rate	0	Security audit log entries for blocked escalation	Any occurrence — immediate alert

Logging

Every session: session ID, user identity, session start/end time, total cost (LLM + API calls where available).
Every request: request text (subject to data classification masking), intent classification, filtered catalogue size.
Every plan: plan ID, LLM model version, tools selected, plan generation latency, token usage.
Every step: step number, tool name, parameters (sanitised), API response status, latency, retry count.
Every synthesis: synthesis latency, token usage, output length.

Incident Response

Privilege escalation attempt detected: immediate alert; session suspended; security review of audit log; block user pending investigation if repeated.
LLM provider outage: Composition Engine returns "AI service temporarily unavailable" — no automated fallback to less capable model (plan quality cliff is a safety concern); users directed to manual process.
API gateway outage: all API calls fail; plan execution halts with partial completion message; session state preserved for resume on recovery.

Disaster Recovery

Scenario	RTO	RPO	Recovery Procedure
Composition Engine failure	3 minutes	0 (stateless service; session state in Redis)	Kubernetes restart; users re-submit requests
Session Manager (Redis) failure	5 minutes	Up to 5 minutes of session state	Redis Sentinel failover; active sessions may need re-authentication
LLM provider outage	N/A (dependency)	N/A	Fail-fast; user communication; no automated fallback
API Catalogue service failure	10 minutes	0	Restart; cached catalogue snapshot serves recent requests during brief outage

Capacity Planning

LLM token consumption: (average prompt tokens per request) × (1 + replan rate) × (average steps per plan) × (requests per day) = daily token budget.
Composition Engine compute: scales with concurrent request count × average execution duration; typical 2–5 concurrent sessions per vCPU.
Session Manager: size for (concurrent sessions) × (average session state size) × (session duration); typically low memory footprint.

11. Cost Considerations

Cost Drivers

Cost Driver	Description	Typical Proportion
LLM API (plan generation + synthesis)	Token-based; complex plans with large tool catalogues can consume 5,000–20,000 tokens per request	50–70%
Downstream API Costs	Charges from internal APIs or external SaaS APIs called as part of the plan	10–25%
Composition Engine Compute	Container runtime; relatively low — I/O-bound, not compute-bound	5–10%
Session Manager	Redis or equivalent; low cost	1–3%
Audit Log Storage	High write volume; 7-year retention for regulated use cases	5–10%
LLM Re-plan Calls	Additional token cost from validation failures triggering re-planning	5–15%

Scaling Risks

Token costs are highly sensitive to tool catalogue size — a 200-tool catalogue sent in every request generates 10–20× more input tokens than a 20-tool filtered view. Intent-based filtering is critical for cost control.
Replan rate amplifies LLM costs non-linearly; poor API definitions increase replan rate; investing in high-quality tool descriptions pays immediate ROI.
Agentic loops (multi-step re-invocations) can consume unbounded tokens if step count limits are not enforced.

Cost Optimisations

Intent-based catalogue filtering: reduce tool catalogue size per request from 200 to 20 tools — 10× LLM input token reduction.
Use a cheap classifier model for intent classification; reserve expensive model for plan generation only.
Response caching: identical or near-identical requests within a session window serve cached results.
Shadow mode for testing: prevent test traffic from incurring downstream API costs.

Indicative Cost Range

Scale	LLM API Monthly	Infrastructure Monthly	Total Monthly
Small (1,000 requests/day, 10 steps avg)	$2,000–$8,000	$500–$2,000	$2,500–$10,000
Medium (10,000 requests/day, 10 steps avg)	$15,000–$60,000	$3,000–$8,000	$18,000–$68,000
Large (100,000 requests/day, 10 steps avg)	$120,000–$500,000	$20,000–$50,000	$140,000–$550,000

12. Trade-Off Analysis

Architectural Options Comparison

Option	Flexibility	Predictability	Latency	Security Risk	Cost	Recommended For
Option A — LLM-based API Composition (this pattern)	Very High	Medium	5–30s	High (requires rigorous scoping)	High	Complex, varied user intents; large API catalogue
Option B — Static Workflow Orchestration	Low	Very High	< 1s	Low	Low	Known, fixed integration patterns
Option C — RPA / Macro Recording	Medium	High	Variable	Medium	Medium	UI-level automation of legacy systems
Option D — No-Code Integration Platform	Medium	High	Variable	Low-Medium	Medium	Business user self-service with defined action vocabulary

Architectural Tensions

Tension	Trade-Off	Resolution
Plan flexibility vs. Security boundary	More flexible LLM planning creates larger attack surface for privilege escalation	Execution harness as hard permission boundary; LLM plan generation is advisory, never authoritative
Catalogue size vs. Plan quality vs. Cost	Large catalogue = better coverage; also = more tokens, more noise, lower plan quality	Intent-based filtering: give the LLM only the tools relevant to the detected intent
Agentic depth vs. Execution predictability	Deeper agentic loops produce better results for complex requests; also harder to audit and bound	Maximum step count limit; maximum wall-clock limit; intermediate result checkpointing for user review

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
LLM generates plan exceeding user permissions	Medium	High — attempted unauthorised data access	Execution harness rejects call; security audit log entry	Plan rejected; user receives permission error; repeated attempts trigger security review
LLM generates incorrect parameter extraction	High	Medium — API call fails or returns wrong data	Parameter validation catches schema violations; LLM re-plans	Max 3 re-plan attempts; escalate to user if unresolved
Indirect prompt injection via API response content	Low	High — LLM behaviour manipulated via malicious data in API response	Structured response parsing; no free-text API response injected into LLM prompt	Detection via unexpected tool call patterns in audit log; session termination on detection
Step count limit exceeded on complex request	Medium	Low — partial completion	Execution harness step counter	Return partial results with clear incomplete status to user
LLM provider rate limit during multi-step plan	Medium	Medium — plan execution interrupted	HTTP 429 from LLM provider	Retry with backoff; if plan cannot complete, save checkpoint and resume on retry
Stale API catalogue definition	Medium	Medium — LLM generates calls with incorrect parameters	High replan rate as validation fails repeatedly	API catalogue freshness monitoring; alert when OpenAPI spec changes without catalogue update

Cascading Failure Scenarios

Prompt injection in API response + no structured parsing: Malicious data in CRM API response reprograms LLM to call data export API with elevated scope → execution harness blocks the call (permission boundary) but attacker learns API catalogue from error responses. Mitigation: treat all API response content as untrusted; parse structured fields by name, never inject raw response into LLM context.
High replan rate + no cost circuit breaker: Poorly-written API definition causes 100% replan rate on a class of requests → LLM costs spike 15× → monthly budget exceeded → service suspended. Mitigation: per-session and per-user daily token budget limits with hard stop.

14. Regulatory Considerations

APRA CPS 230 — Operational Risk

Clause 36: Agentic AI orchestration of enterprise APIs is a novel operational risk; the enterprise must document the control framework (permission boundaries, step limits, audit logging) in the operational risk framework.
Clause 49: LLM providers whose models orchestrate enterprise API calls are material service providers under CPS 230 third-party risk requirements.

APRA CPS 234 — Information Security

Clause 15: Permission boundary enforcement by the execution harness, tool catalogue access control, and audit logging are the primary information security controls for this pattern.
Clause 36 (Incident Notification): Privilege escalation attempts detected by the execution harness must be assessed as potential security incidents under CPS 234 notification obligations.

Australian Privacy Act 1988 (as amended)

APP 6: LLM orchestration accessing personal data APIs must be within the scope of the use purpose for which data was collected; general-purpose "tell me anything about this customer" compositions may not satisfy APP 6.
APP 11: Session logs containing personal data drawn from API responses are subject to security and retention obligations; default session log retention should be minimised.

EU AI Act (2024)

Article 6 (High-Risk AI): If this pattern is used to automate or support decisions in high-risk categories (credit, employment, access to essential services), it is a high-risk AI system requiring conformity assessment.
Article 14 (Human Oversight): Human approval gates for write operations and bulk operations directly implement the human oversight requirement.
Article 12 (Record-keeping): Execution audit logs satisfy the logging requirement; retention periods must align with Art. 12(1)(a) for high-risk systems.

ISO 42001 — AI Management System

Clause 8.4 (AI System Impact Assessment): Impact assessment required before deploying LLM orchestration in any regulated business process domain.

NIST AI RMF (2023)

GOVERN 6.1: Roles and responsibilities for LLM-initiated API calls must be clearly assigned — who is accountable when an LLM-orchestrated action causes harm?
MAP 5.1: LLM API composition in regulated domains is a high-risk deployment; risk treatment must include the human oversight and permission boundary controls described in this pattern.

15. Reference Implementations

AWS

Composition Engine: AWS Lambda (function per request) or ECS Fargate
LLM: Amazon Bedrock (Claude 3.5 Sonnet with tool use) or OpenAI API via direct call
API Catalogue: AWS API Gateway with OpenAPI export; custom catalogue compiler Lambda
Parameter Validation: Pydantic in Lambda runtime
Execution Harness: Lambda with AWS STS AssumeRole for per-user credential scoping
Session Manager: Amazon ElastiCache (Redis OSS)
Audit Logger: CloudWatch Logs + S3 for long-term retention; Athena for query

Azure

Composition Engine: Azure Functions (Flex Consumption) or Azure Container Apps
LLM: Azure OpenAI Service (GPT-4o with function calling)
API Catalogue: Azure API Management with OpenAPI export; custom compiler Function
Parameter Validation: Pydantic or Azure API Management built-in schema validation
Execution Harness: Function with Azure Managed Identity + per-user delegated access tokens
Session Manager: Azure Cache for Redis
Audit Logger: Application Insights + Azure Monitor + Log Analytics

GCP

Composition Engine: Cloud Run (request-scoped scaling)
LLM: Vertex AI (Gemini 1.5 Pro with function calling) or Anthropic Claude via Vertex AI
API Catalogue: Apigee API Hub with OpenAPI export; custom compiler Cloud Run service
Parameter Validation: Pydantic in Python runtime
Execution Harness: Cloud Run with Workload Identity and per-user token exchange
Session Manager: Memorystore (Redis)
Audit Logger: Cloud Logging → BigQuery for compliance queries

On-Premises / Private Cloud

Composition Engine: Python FastAPI on Kubernetes
LLM: vLLM or Ollama serving Llama 3.1 70B with function calling, or self-hosted mistral-7B
API Catalogue: Kong API Gateway with OpenAPI export; custom Python catalogue compiler
Parameter Validation: Pydantic
Execution Harness: Custom Python service with per-user credential injection from HashiCorp Vault
Session Manager: Redis on Kubernetes via Bitnami Helm chart
Audit Logger: Fluentd → Elasticsearch → Kibana; long-term archive to MinIO

Pattern	Relationship	Notes
EAAPL-INT001 — Enterprise AI Service Bus	Complementary	Composition Engine publishes execution events to the AI Service Bus for enterprise-wide cost and audit visibility
EAAPL-INT002 — Legacy System AI Augmentation	Complementary	Legacy API adapters expose legacy systems as API catalogue entries for LLM composition
EAAPL-INT007 — AI Circuit Breaker	Enables	Circuit breaker wraps LLM provider calls and downstream API calls within the execution harness
EAAPL-INT008 — Bidirectional AI Sync	Related	Composition results may trigger sync events to update enterprise data stores

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Justification
Architectural Completeness	5	All stages — intent, catalogue, plan, validation, execution, synthesis — fully specified
Operational Readiness	4	SLOs and monitoring defined; LLM provider dependency creates inherent availability limit
Security Coverage	5	Permission boundary, prompt injection, privilege escalation, OWASP LLM Top 10 all addressed
Governance Coverage	5	Human approval gates, audit trail, model risk, policy all included
Cost Predictability	3	Token costs variable; agentic depth variability is inherent; budget controls required
Implementation Complexity	2	High complexity — requires mature API catalogue, permission model, and LLM engineering capability
Industry Validation	4	Deployed in production at financial services and professional services firms; healthcare implementations emerging

18. Revision History

Version	Date	Author	Changes
1.0	2026-06-12	EAAPL Working Group	Initial publication — integration patterns series

← Back to Library More AI Integration →

EAAPL-INT003 — AI-Powered API Composition

EAAPL-INT003 — AI-Powered API Composition

1. Executive Summary

2. Problem Statement

Business Problem

Technical Problem

Symptoms

Cost of Inaction

3. Context

When to Apply

When NOT to Apply

Prerequisites

Industry Applicability

4. Architecture Overview

5. Architecture Diagram

6. Components

7. Data Flow

Primary Flow

Error Flow

8. Security Considerations

Authentication and Authorisation

Secrets Management

Data Classification

Encryption

Auditability

OWASP LLM Top 10 Mitigations

9. Governance Considerations

Responsible AI

Model Risk Management

Human Approval Gates

Policy and Traceability

Governance Artefacts

10. Operational Considerations

Monitoring and SLOs

Logging

Incident Response

Disaster Recovery

Capacity Planning

11. Cost Considerations

Cost Drivers

Scaling Risks

Cost Optimisations

Indicative Cost Range

12. Trade-Off Analysis

Architectural Options Comparison

Architectural Tensions

13. Failure Modes

Cascading Failure Scenarios

14. Regulatory Considerations

APRA CPS 230 — Operational Risk

APRA CPS 234 — Information Security

Australian Privacy Act 1988 (as amended)

EU AI Act (2024)

ISO 42001 — AI Management System

NIST AI RMF (2023)

15. Reference Implementations

AWS

Azure

GCP

On-Premises / Private Cloud

16. Related Patterns

17. Maturity Assessment

18. Revision History