EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAI Integration
Proven
⇄ Compare

EAAPL-INT001 — Enterprise AI Service Bus

EAAPL-INT001 — Enterprise AI Service Bus

Tags: event-driven asynchronous enterprise-only high-complexity Status: Proven | Version: 1.0 | Domain: Integration


1. Executive Summary

The Enterprise AI Service Bus pattern establishes an event-driven integration backbone that routes, mediates, and governs AI capability consumption across the enterprise. Rather than allowing each business unit to wire directly to model providers, the pattern inserts a durable, schema-governed event mesh between AI producers (models, pipelines, agents) and AI consumers (applications, dashboards, downstream processes).

The pattern extends the CloudEvents 1.0 specification with AI-specific fields—model identity, prompt version, token usage, confidence score, latency, and cost—ensuring that every AI inference event is a first-class, auditable artefact. Topic design decouples consumers from model changes: one topic per AI use-case domain, not per model, so upgrading GPT-4 to GPT-4o does not require re-wiring 30 downstream subscribers.

For CIOs and CTOs, the bus provides three strategic outcomes: (1) unified cost visibility across all AI workloads through event-level cost attribution; (2) replay capability to reprocess historical inputs when a better model becomes available; (3) a single enforcement point for data classification, rate limiting, and policy compliance before any AI event reaches a consumer.


2. Problem Statement

Business Problem

AI capabilities are being procured and integrated independently by individual teams. There is no central visibility into total AI spend, no consistent governance of what data enters AI models, and no mechanism to upgrade models without coordinated redeployment across all consuming systems.

Technical Problem

Point-to-point integrations between applications and AI APIs create a tangled dependency graph. Each integration handles retries, error logging, cost tracking, and schema evolution differently. When a model API changes its response format or is deprecated, every consuming application must be updated independently.

Symptoms

  • Multiple teams have separate API keys for the same AI provider with no consolidated billing.
  • A model deprecation notice causes a multi-team incident requiring weeks of parallel migration work.
  • There is no audit trail linking a business decision to the specific AI model version and prompt that produced it.
  • AI inference costs are allocated to cloud infrastructure budgets rather than business unit P&Ls.
  • Failed AI inference events are silently discarded, making root cause analysis impossible.

Cost of Inaction

  • Financial: Duplicate AI spend across business units; inability to negotiate volume discounts without consolidated usage data. Typical over-spend: 30–60% of actual AI API cost.
  • Operational: Every model upgrade requires coordinated change across all consuming teams — 4 to 12 weeks of migration effort per model generation.
  • Risk: No audit trail for AI-assisted decisions exposes the organisation to regulatory non-compliance under EU AI Act Article 13 (transparency) and APRA CPS 230 operational risk standards.
  • Strategic: Inability to replay historical workloads with improved models forfeits compounding model improvement value.

3. Context

When to Apply

  • The enterprise has ≥3 distinct teams consuming AI capabilities.
  • AI inference is embedded in business-critical workflows where auditability is required.
  • The organisation operates under financial services, healthcare, or government regulatory regimes.
  • Model upgrade cycles must not require coordinated consumer redeployment.
  • Cost attribution to business units is a finance or governance requirement.

When NOT to Apply

  • Single-team AI workload with no cross-system integration.
  • Proof-of-concept or exploratory AI workloads where operational overhead is not justified.
  • Ultra-low-latency requirements (< 50ms) where broker overhead is architecturally incompatible.
  • Simple request/response integrations where event-driven complexity adds no value.

Prerequisites

  • A mature enterprise messaging platform (Kafka, Azure Service Bus, AWS EventBridge, Pub/Sub).
  • A schema registry capable of enforcing Avro, Protobuf, or JSON Schema evolution compatibility.
  • Centralised secrets management for AI provider API keys.
  • Observability platform capable of ingesting event-level metrics.

Industry Applicability

Industry Applicability Primary Driver
Financial Services High Regulatory auditability, cost attribution, model risk governance
Government High Data classification enforcement, audit trail requirements
Healthcare High PHI data governance, model version traceability for clinical decisions
Retail / eCommerce Medium Multi-team AI consumption, cost management
Telecommunications Medium High-volume event streams, multi-domain AI use cases
Startups (< 50 engineers) Low Overhead exceeds benefit at this scale

4. Architecture Overview

The Enterprise AI Service Bus is a layered event-driven architecture consisting of five logical planes: the ingestion plane, the governance plane, the routing plane, the processing plane, and the consumer plane.

Ingestion Plane. AI event producers — applications initiating AI inference requests — publish to the bus using an extended CloudEvents envelope. The CloudEvents 1.0 base fields (id, source, specversion, type, time, datacontenttype) are preserved intact. The AI extension fields are added as CloudEvents extension attributes: ai_model_id, ai_model_version, ai_prompt_version, ai_token_usage_prompt, ai_token_usage_completion, ai_confidence_score, ai_latency_ms, ai_cost_usd, ai_use_case_domain, ai_data_classification. Producers never call AI provider APIs directly. The AI SDK client library handles envelope construction, ensuring extension field completeness before the event is published.

Governance Plane. A policy enforcement processor subscribes to the raw inbound topic, validates the CloudEvents schema against the schema registry, applies data classification rules (blocking PII fields from reaching models not cleared for that classification), enforces per-producer rate limits, and re-publishes validated events to the routed topic. Failed validation events are routed to the governance dead letter queue with the specific violation reason attached. This plane is the single enforcement point for the enterprise AI usage policy.

Routing Plane. Topic design follows the domain-per-topic principle, not model-per-topic. Topics are named by business domain and event type: ai.creditrisk.application-assessment.v1, ai.customerservice.intent-classification.v1, ai.fraud.transaction-scoring.v1. This topology means upgrading the underlying model from GPT-4 to GPT-4o requires no change to topic names or consumer configurations — the model is a deployment detail of the AI inference worker, not an integration concern.

Processing Plane. AI inference workers subscribe to domain topics, execute inference against the configured model provider, and publish results to result topics following the same CloudEvents envelope pattern. The result event adds ai_result, ai_result_schema_version, and ai_fallback_used extension fields. Workers are stateless and horizontally scalable. Consumer group design ensures each logical consumer role (e.g., fraud-scorer, risk-ranker) receives every event independently without competing for the same partition offset.

Consumer Plane. Downstream applications subscribe to result topics. Consumers are shielded from model provider changes, prompt changes, and inference worker implementation details. The event schema version field enables consumers to handle multiple result schema versions concurrently during rolling upgrades.

Replay Architecture. All events — requests and results — are retained in compacted topics or object storage with a configurable retention period (recommended: 90 days for standard, 7 years for regulated use cases). Replay is initiated by re-publishing retained events to a replay topic. Replay events include the original id and a ai_replay_of extension field, enabling downstream deduplication and differentiation of original vs. replayed processing.

Back-Pressure Handling. AI inference is significantly slower than typical event processing (50ms–30s vs <1ms for simple transforms). Back-pressure is handled via consumer lag monitoring per consumer group: when lag exceeds the configured threshold, the auto-scaler adds inference worker instances. Hard rate limits per consumer group prevent a single workload from monopolising broker throughput.

Dead Letter Queue Architecture. Every consumer group has a corresponding DLQ topic. Events are routed to the DLQ after the configured maximum retry count with full event context preserved: original event, error message, retry count, last failure timestamp, and the consumer group that failed. DLQ topics are monitored; alerts fire at configurable message count thresholds. A replay-from-DLQ operator enables manual investigation and reprocessing.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Ingestion["Ingestion and Governance"] A[AI Event Producers] B[CloudEvents SDK Client] C[Schema Validator + Policy Enforcer] D[Governance DLQ] end subgraph Routing["Domain Topic Routing"] E[Domain Topics per Use Case] F[AI Inference Workers] G[Model Provider] end subgraph Consumers["Consumer and Archive"] H[Result Topics] I[Downstream Consumers] J[(Event Archive + Replay)] end A --> B B --> C C -->|violation| D C -->|routed| E E --> F F --> G G -->|result| F F --> H H --> I H --> J J -->|replay| B style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f3e8ff,stroke:#a855f7 style D fill:#fee2e2,stroke:#ef4444 style E fill:#fef9c3,stroke:#eab308 style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#fef9c3,stroke:#eab308 style I fill:#d1fae5,stroke:#10b981 style J fill:#fef9c3,stroke:#eab308

6. Components

Component Type Responsibility Technology Options Criticality
AI SDK Client Library Library CloudEvents envelope construction, extension field population, publisher abstraction Custom SDK (Python/Java/Node), Dapr SDK Critical
Schema Registry Infrastructure Enforce event schema evolution compatibility; validate inbound events Confluent Schema Registry, AWS Glue Schema Registry, Azure Schema Registry Critical
Message Broker Infrastructure Durable topic management, consumer group offsets, replay retention Apache Kafka, Azure Service Bus Premium, AWS MSK, Google Pub/Sub Critical
Governance Processor Service Schema validation, data classification enforcement, rate limiting, governance DLQ routing Kafka Streams app, Azure Stream Analytics, custom Flink job Critical
AI Inference Worker Service Topic subscription, model provider API call, result event publication, retry logic Containerised Python/Node service, AWS Lambda, Azure Functions High
Dead Letter Queue Processor Service DLQ monitoring, alerting, manual replay tooling Custom service + alerting integration High
Event Archive Storage Long-term event retention for audit and replay Kafka compacted topics + S3/ADLS/GCS, Apache Iceberg, Delta Lake High
Replay Operator Service Re-publish archived events to inbound topic with replay metadata Custom CLI/service Medium
Observability Collector Infrastructure Consume all topics to extract cost, latency, quality metrics per domain Kafka consumer + Prometheus metrics, Datadog, Splunk High
Consumer Group Manager Configuration Define and enforce consumer group isolation across domains Kafka AdminClient, Terraform-managed topic ACLs Medium

7. Data Flow

Primary Flow

Step Actor Action Output
1 Application Calls AI SDK Client Library with domain payload and data classification label CloudEvents envelope with AI extension fields populated
2 AI SDK Publishes event to ai.raw.inbound.v1 topic Event persisted in broker with offset
3 Governance Processor Validates event schema against registry; checks data classification vs model clearance; checks rate limit Validated event forwarded to domain topic OR rejected to governance DLQ
4 AI Inference Worker Subscribes to domain topic, receives event, constructs model provider API request Model provider API call with prompt and context
5 Model Provider Executes inference AI response with token counts, finish reason
6 AI Inference Worker Constructs result CloudEvent with ai_result, ai_confidence_score, ai_latency_ms, ai_cost_usd, ai_fallback_used Result event published to result topic
7 Consumer Application Subscribes to result topic, processes AI result, updates business state Business process continues with AI-enriched data
8 Event Archive Subscribes to all topics; archives events to long-term storage Immutable event log for audit and replay

Error Flow

Step Error Condition Detection Recovery
2 Schema validation failure Schema registry rejects event Event routed to governance DLQ with violation detail
3 Data classification violation Policy enforcer classification check fails Event rejected to governance DLQ; producer alerted
4 Model provider API error (5xx) HTTP error or timeout from provider Retry with exponential backoff; after max retries, route to inference DLQ
4 Model provider rate limit (429) HTTP 429 response Back-off per Retry-After header; consumer group lag accumulates; auto-scaler adjusts
6 Result schema validation failure Result event fails schema check Worker logs error; original event moved to inference DLQ with error context
7 Consumer processing failure Consumer throws exception after N retries Consumer framework routes to consumer-group-specific DLQ

8. Security Considerations

Authentication and Authorisation

  • All producers authenticated to broker using mTLS client certificates or SASL/SCRAM.
  • Topic ACLs enforced: each producer has write access only to ai.raw.inbound.v1; each inference worker has read access only to its assigned domain topics.
  • AI provider API keys stored in centralised secrets manager (not in event payloads); injected into worker environment at runtime.
  • Consumer applications have read-only ACL to their subscribed result topics only.

Secrets Management

  • AI provider API keys rotated on a 90-day cycle; rotation does not require worker redeployment (secrets manager dynamic injection).
  • Broker TLS certificates managed by PKI infrastructure with automated renewal.
  • Schema registry credentials managed via service accounts with least-privilege access.

Data Classification

  • All events tagged with data classification at source; governance processor enforces model clearance against classification.
  • PII-tagged events are only routed to models with verified PII data processing agreements.
  • Event payloads in transit encrypted (TLS 1.3); at rest encrypted (AES-256) in broker storage and event archive.

Auditability

  • Every event carries a globally unique id (UUID v4); the full audit trail from request to result is reconstructable by correlating on id and ai_replay_of.
  • Governance DLQ events include the specific policy violation reason, enabling compliance reporting on rejected AI usage attempts.

OWASP LLM Top 10 Mitigations

OWASP LLM Risk Relevance Mitigation in This Pattern
LLM01 — Prompt Injection High Governance processor validates event payload schema; free-text fields flagged for prompt injection scanning before routing to inference workers
LLM02 — Insecure Output Handling High Result events validated against result schema before publication; consumers receive structured, schema-typed fields not raw model output
LLM03 — Training Data Poisoning Medium Read-only audit trail of all training-relevant events; replay events flagged separately to prevent replay data polluting training pipelines
LLM04 — Model Denial of Service High Per-producer and per-consumer-group rate limits enforced by governance processor; cost spike circuit breaker triggers circuit open
LLM05 — Supply Chain Vulnerabilities Medium Model provider API calls go through inference workers only; SDK pinned versions in worker container images; SBOM generated per release
LLM06 — Sensitive Information Disclosure High Data classification enforcement prevents PII reaching uncertified models; no raw prompt or response stored in topics beyond configurable retention
LLM07 — Insecure Plugin Design Medium Function-calling plugins not applicable to this pattern; inference workers expose no external plugin surface
LLM08 — Excessive Agency High Inference workers are passive responders; no autonomous action capability; all results require consumer application to act
LLM09 — Overreliance Medium ai_confidence_score field in every result event; consumers can implement confidence thresholds before acting on AI results
LLM10 — Model Theft Medium API keys never in event payloads; model provider credentials not accessible to consumers; inference workers isolated in dedicated network segment

9. Governance Considerations

Responsible AI

  • Every AI inference event carries ai_use_case_domain enabling post-hoc analysis of AI usage by domain against ethical use policies.
  • Confidence scores and model version in every result event support bias monitoring per domain over time.
  • Human override mechanism: consumers can publish to ai.[domain].human-override.v1 topic to record cases where AI result was rejected by a human decision-maker.

Model Risk Management

  • Schema registry enforces that breaking prompt changes result in a new ai_prompt_version value, enabling performance comparison between prompt versions using event analytics.
  • Model upgrade path: deploy new inference worker version subscribing to same domain topic; run shadow mode (dual-publish old and new results to separate result topics); compare result quality before cutover.

Human Approval Gates

  • High-stakes domains (credit decisions, medical recommendations) configure a requires_human_review flag in domain topic config; governance processor enriches events with this flag before routing to inference workers; result events include human_review_required: true to trigger downstream approval workflow.

Policy and Traceability

  • AI usage policy stored in policy-as-code repository; governance processor references versioned policy definitions; policy version embedded in governance validation result.
  • Full event lineage from source application through governance validation through inference to consumer available via event id correlation in the event archive.

Governance Artefacts

Artefact Owner Update Frequency Storage Location
AI Usage Policy (policy-as-code) Chief AI Risk Officer Per policy change Policy repository (Git-backed)
Schema Registry Schemas Platform Engineering Per event schema change Schema Registry + Git backup
Topic ACL Configuration Platform Engineering Per onboarding/offboarding Terraform state + Git
DLQ Review Report AI Governance Team Weekly Governance dashboard
Model Upgrade Decision Record AI Platform Team Per model version change Architecture Decision Record repository
Cost Attribution Report Finance / FinOps Monthly FinOps platform

10. Operational Considerations

Monitoring and SLOs

SLO Target Measurement Alert Threshold
Event end-to-end latency (p99) < 10s for async; < 500ms for near-real-time Time from publish to result topic to consumer receipt > 15s sustained for 5 min
Consumer group lag (all groups) < 1000 events Broker consumer lag metric > 5000 events accumulating
Governance rejection rate < 0.5% DLQ event count / total events > 2% in any 15-min window
Inference worker availability 99.9% Worker health check success rate < 99.5% over 5 min
DLQ growth rate 0 net new per hour (steady state) DLQ message count delta Any sustained growth
Event archive completeness 100% Archive record count vs broker offset Any gap

Logging

  • Every governance processor decision logged with: event id, producer, domain, classification, policy version, decision (allow/reject), rejection reason.
  • Every inference worker call logged with: event id, model provider, model id, prompt version, token usage, latency, cost, success/failure.
  • Logs shipped to SIEM for security analysis; to observability platform for operational analysis.

Incident Response

  • Governance processor failure: producers continue publishing to raw topic; events accumulate until processor recovers; no data loss (broker durability). Alert fires within 60 seconds of processor unavailability.
  • Inference worker failure: domain topic consumer lag accumulates; auto-scaler adds new worker instances within 3 minutes; SLO breach alert if lag exceeds 5000 events.
  • Model provider outage: circuit breaker opens after configured error rate threshold; fallback response or human queue escalation activated; incident ticket auto-created with cost-so-far and impacted domains.

Disaster Recovery

Scenario RTO RPO Recovery Procedure
Single inference worker failure 3 minutes 0 (broker retains events) Auto-scaling replaces worker; consumer group resumes from last committed offset
Governance processor failure 5 minutes 0 Kubernetes deployment restart; events accumulate in raw topic during outage
Broker node failure 10 minutes 0 (replicated partitions) Kafka partition leader election; consumers reconnect automatically
Full broker cluster failure 4 hours 0 (cross-region replica) Failover to replica cluster; update producer/consumer connection strings
Event archive corruption 24 hours Up to retention boundary Restore from backup; replay from broker if within retention period

Capacity Planning

  • Broker storage: (average event size in KB) × (events per day) × (retention days) × 3 (replication factor).
  • Inference worker sizing: target throughput (events/min) / per-worker throughput (events/min) = minimum worker count; add 50% headroom for burst.
  • Schema registry: low resource requirements; size for HA (3-node ensemble) not throughput.

11. Cost Considerations

Cost Drivers

Cost Driver Description Typical Proportion
AI Model Provider API Costs Token-based charges for every inference event; dominant cost driver 55–70%
Managed Broker (MSK/Service Bus) Per-partition-hour + data transfer + storage 10–20%
Inference Worker Compute Container/function runtime for worker fleet 8–15%
Event Archive Storage Long-term event retention in object storage 3–8%
Schema Registry Managed service or self-hosted compute 1–3%
Observability (metrics/logs) Event-level metric ingestion volume 3–7%

Scaling Risks

  • AI provider token costs scale linearly with event volume; cost spike protection requires cost-rate circuit breaker or monthly budget alerts.
  • Kafka storage costs can grow unexpectedly with long retention periods on high-volume topics; topic-level retention policies must be actively managed.
  • Inference worker auto-scaling lags behind sudden traffic spikes by 2–5 minutes; pre-warm workers for known batch jobs.

Cost Optimisations

  • Batch small events into micro-batches in the inference worker to reduce per-call API overhead and take advantage of batch inference pricing.
  • Use spot/preemptible instances for non-latency-sensitive inference workers (batch domains).
  • Implement caching layer in inference worker for identical or near-identical prompts (semantic deduplication) — typical cache hit rate 15–30% for structured workloads.
  • Compress event payloads (Snappy/LZ4 for Kafka) to reduce broker storage and network costs.

Indicative Cost Range

Scale Monthly Infrastructure AI Provider API Total Monthly
Small (10M events/mo, 3 domains) $1,500–$3,000 $5,000–$15,000 $6,500–$18,000
Medium (100M events/mo, 10 domains) $8,000–$15,000 $40,000–$120,000 $48,000–$135,000
Large (1B+ events/mo, 30+ domains) $40,000–$80,000 $300,000–$800,000 $340,000–$880,000

12. Trade-Off Analysis

Architectural Options Comparison

Option Description Latency Cost Governance Complexity Recommended For
Option A — Enterprise AI Service Bus (this pattern) Asynchronous event bus with schema governance, domain topics, replay 500ms–30s Medium infrastructure + AI API Centralised, strong High Large enterprise, regulated industries, multi-team AI consumption
Option B — Direct AI API Integration Each application calls AI provider API directly 100ms–10s Low infrastructure, highest AI API Decentralised, weak Low Single-team, exploratory, non-regulated
Option C — Synchronous AI Gateway Synchronous API gateway proxying AI provider calls; no broker 200ms–15s Medium Medium Medium Medium enterprise, request/response workloads, low replay requirement

Architectural Tensions

Tension Trade-Off Resolution
Latency vs. Governance Adding governance processor to event path adds 50–200ms latency Accept latency for regulated domains; implement fast-path bypass for pre-approved, non-sensitive use cases
Topic granularity vs. Consumer flexibility Coarse domain topics couple unrelated use cases; fine-grained topics increase management overhead One topic per domain AND event type version; avoid sub-domain splits until consumer count justifies it
Replay completeness vs. Storage cost Full event retention enables unlimited replay; drives storage costs Tiered retention: 90 days hot (broker), 7 years cold (object storage with restore latency)
Schema evolution rigidity vs. Innovation speed Strict schema compatibility slows prompt experimentation Use schema registry for result events (consumer-facing); allow looser schema for internal inference events behind the governance plane

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Governance processor becomes unavailable Low High — all new events blocked from routing Consumer lag on raw topic grows; health check fails Kubernetes restart; events accumulate durably in broker
AI provider API key expires or is revoked Medium High — all inference workers fail HTTP 401 errors from provider; inference DLQ growth Rotate key in secrets manager; workers pick up automatically
Schema registry unavailable Low High — new events cannot be validated Governance processor errors; alert fires Read-through cache on governance processor provides short-term continuity; restore registry
Consumer group offset corruption Very Low Medium — some events may be reprocessed Duplicate events in consumer application Idempotent consumer processing (dedup on event id); replay from known-good offset
Back-pressure causing broker disk exhaustion Medium Critical — broker stops accepting new events Broker disk usage alert Increase broker storage; add topic retention policy enforcement; throttle producers
Model provider rate limit hit High Medium — inference latency increases HTTP 429 responses; consumer lag growth Exponential backoff; distribute load across multiple provider API keys; activate fallback model

Cascading Failure Scenarios

  • Governance processor failure + high event volume: Raw topic fills beyond retention period → events lost. Mitigation: extend raw topic retention to 7 days; alert on raw topic consumer lag within 60 seconds.
  • Inference DLQ accumulation + no DLQ monitoring: Silent event loss for hours; downstream consumers starved of results, triggering application-level failures. Mitigation: DLQ monitoring and alerting is mandatory, not optional.
  • Model provider global outage + no circuit breaker + no fallback: All inference workers retry indefinitely → exhausts retry budget → all events land in DLQ → consumers receive no results → downstream business processes halt. Mitigation: circuit breaker with fallback response is non-negotiable for production deployments.

14. Regulatory Considerations

APRA CPS 230 — Operational Risk

  • Clause 36 (Business Continuity): The event bus must have documented RTO/RPO for each failure scenario. Replay capability directly addresses recovery of AI processing after outages.
  • Clause 52 (Service Provider Management): AI model providers are third-party service providers; the governance processor enforces usage controls required under third-party risk management.

APRA CPS 234 — Information Security

  • Clause 15 (Information Security Controls): mTLS authentication, topic ACLs, and data classification enforcement address the requirement for controls proportional to data sensitivity.
  • Clause 36 (Incident Notification): Governance DLQ violations and model provider outages must be assessed as potential security incidents under CPS 234 notification obligations.

Australian Privacy Act 1988 (as amended 2024)

  • APP 6 (Use and Disclosure): Data classification enforcement in the governance processor operationalises the requirement to use personal information only for the primary purpose disclosed at collection.
  • APP 8 (Cross-border Disclosure): Events routed to offshore model providers must have the country of processing recorded in the AI extension fields; governance processor must block cross-border routing for events exceeding permitted data sharing boundaries.

EU AI Act (2024)

  • Article 13 (Transparency): ai_model_id, ai_model_version, and ai_prompt_version in every event satisfy the requirement to document the AI system used in automated decisions affecting natural persons.
  • Article 17 (Quality Management): Schema registry enforcement, DLQ monitoring, and replay capability are evidence of a quality management system for AI outputs.
  • Article 12 (Record-keeping): Event archive with 7-year retention for high-risk AI use cases directly satisfies the logging obligation for high-risk AI systems.

ISO 42001 — AI Management System

  • Clause 6.1.2 (AI Risk Assessment): Per-domain circuit breakers and confidence score tracking operationalise the risk assessment and monitoring requirements.
  • Clause 8.5 (AI System Lifecycle): Prompt versioning, model version tracking, and replay capability support the AI lifecycle management requirements.

NIST AI RMF (2023)

  • GOVERN 1.1: AI usage policy encoded in governance processor addresses the organisational risk governance requirement.
  • MEASURE 2.5: Confidence score monitoring and quality degradation circuit breaker conditions implement the performance measurement requirement.
  • MANAGE 2.4: DLQ with full context capture and replay capability addresses the AI risk treatment and incident response requirements.

15. Reference Implementations

AWS

  • Broker: Amazon MSK (Kafka-compatible) with MSK Connect for governance processor
  • Schema Registry: AWS Glue Schema Registry
  • Inference Workers: AWS Lambda (event-driven) or ECS Fargate containers
  • DLQ: Amazon SQS DLQ connected to MSK via Kafka SQS Sink Connector
  • Event Archive: S3 via Kafka S3 Sink Connector; query via Athena
  • Observability: Amazon CloudWatch + AWS Cost Explorer for per-event cost tracking
  • Secrets: AWS Secrets Manager with Lambda execution role access

Azure

  • Broker: Azure Event Hubs (Kafka-compatible surface) or Azure Service Bus Premium
  • Schema Registry: Azure Schema Registry (built into Event Hubs namespace)
  • Inference Workers: Azure Functions (event-driven triggers) or AKS pods
  • DLQ: Azure Service Bus dead-letter queues
  • Event Archive: Azure Data Lake Storage Gen2 via Event Hubs Capture
  • Observability: Azure Monitor + Application Insights; Cost Management for attribution
  • Secrets: Azure Key Vault with managed identity binding to workers

GCP

  • Broker: Google Cloud Pub/Sub (native) or GKE-hosted Kafka
  • Schema Registry: Confluent Schema Registry on GKE or Apicurio Registry
  • Inference Workers: Cloud Run (event-driven) or GKE deployments
  • DLQ: Pub/Sub dead-letter topics with subscription-level configuration
  • Event Archive: Cloud Storage via Pub/Sub export; query via BigQuery external tables
  • Observability: Cloud Monitoring + Cloud Logging; BigQuery for cost analytics
  • Secrets: Secret Manager with Workload Identity binding

On-Premises / Private Cloud

  • Broker: Apache Kafka (self-managed) on Kubernetes via Strimzi Operator
  • Schema Registry: Confluent Schema Registry OSS or Apicurio Registry
  • Inference Workers: Kubernetes Deployments with KEDA event-driven autoscaling
  • DLQ: Dedicated Kafka topics with Kafka UI for manual review
  • Event Archive: MinIO (S3-compatible) + Apache Iceberg for query
  • Observability: Prometheus + Grafana + Loki stack
  • Secrets: HashiCorp Vault with Kubernetes auth method

Pattern Relationship Notes
EAAPL-INT007 — AI Circuit Breaker Enables Circuit breaker per model provider is a required sub-component of each inference worker in this pattern
EAAPL-INT004 — Real-Time AI Stream Processing Specialises Stream processing pattern is a specific consumer topology for this bus in low-latency domains
EAAPL-INT005 — Batch AI Processing Specialises Batch processing is a consumer topology for this bus in high-throughput, non-latency-sensitive domains
EAAPL-INT002 — Legacy System AI Augmentation Complementary Legacy systems publish to and consume from this bus through adapter components
EAAPL-INT008 — Bidirectional AI Sync Complementary Sync pattern consumes result events from this bus to update enterprise data stores

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Justification
Architectural Completeness 5 All integration, governance, processing, and consumer concerns addressed
Operational Readiness 4 Runbook templates defined; some DR procedures require organisation-specific customisation
Security Coverage 5 mTLS, ACLs, classification enforcement, OWASP LLM Top 10 addressed
Governance Coverage 5 Policy-as-code, audit trail, model risk management, human override all included
Cost Predictability 4 Indicative ranges provided; AI API costs remain variable; budget alerting required
Implementation Complexity 3 High — requires mature messaging platform and operational tooling; not suitable for small teams
Industry Validation 4 Pattern applied in production at major financial institutions and government agencies

18. Revision History

Version Date Author Changes
1.0 2026-06-12 EAAPL Working Group Initial publication — integration patterns series
← Back to LibraryMore AI Integration