EAAPL-INT001 — Enterprise AI Service Bus
Tags: event-driven asynchronous enterprise-only high-complexity
Status: Proven | Version: 1.0 | Domain: Integration
1. Executive Summary
The Enterprise AI Service Bus pattern establishes an event-driven integration backbone that routes, mediates, and governs AI capability consumption across the enterprise. Rather than allowing each business unit to wire directly to model providers, the pattern inserts a durable, schema-governed event mesh between AI producers (models, pipelines, agents) and AI consumers (applications, dashboards, downstream processes).
The pattern extends the CloudEvents 1.0 specification with AI-specific fields—model identity, prompt version, token usage, confidence score, latency, and cost—ensuring that every AI inference event is a first-class, auditable artefact. Topic design decouples consumers from model changes: one topic per AI use-case domain, not per model, so upgrading GPT-4 to GPT-4o does not require re-wiring 30 downstream subscribers.
For CIOs and CTOs, the bus provides three strategic outcomes: (1) unified cost visibility across all AI workloads through event-level cost attribution; (2) replay capability to reprocess historical inputs when a better model becomes available; (3) a single enforcement point for data classification, rate limiting, and policy compliance before any AI event reaches a consumer.
2. Problem Statement
Business Problem
AI capabilities are being procured and integrated independently by individual teams. There is no central visibility into total AI spend, no consistent governance of what data enters AI models, and no mechanism to upgrade models without coordinated redeployment across all consuming systems.
Technical Problem
Point-to-point integrations between applications and AI APIs create a tangled dependency graph. Each integration handles retries, error logging, cost tracking, and schema evolution differently. When a model API changes its response format or is deprecated, every consuming application must be updated independently.
Symptoms
- Multiple teams have separate API keys for the same AI provider with no consolidated billing.
- A model deprecation notice causes a multi-team incident requiring weeks of parallel migration work.
- There is no audit trail linking a business decision to the specific AI model version and prompt that produced it.
- AI inference costs are allocated to cloud infrastructure budgets rather than business unit P&Ls.
- Failed AI inference events are silently discarded, making root cause analysis impossible.
Cost of Inaction
- Financial: Duplicate AI spend across business units; inability to negotiate volume discounts without consolidated usage data. Typical over-spend: 30–60% of actual AI API cost.
- Operational: Every model upgrade requires coordinated change across all consuming teams — 4 to 12 weeks of migration effort per model generation.
- Risk: No audit trail for AI-assisted decisions exposes the organisation to regulatory non-compliance under EU AI Act Article 13 (transparency) and APRA CPS 230 operational risk standards.
- Strategic: Inability to replay historical workloads with improved models forfeits compounding model improvement value.
3. Context
When to Apply
- The enterprise has ≥3 distinct teams consuming AI capabilities.
- AI inference is embedded in business-critical workflows where auditability is required.
- The organisation operates under financial services, healthcare, or government regulatory regimes.
- Model upgrade cycles must not require coordinated consumer redeployment.
- Cost attribution to business units is a finance or governance requirement.
When NOT to Apply
- Single-team AI workload with no cross-system integration.
- Proof-of-concept or exploratory AI workloads where operational overhead is not justified.
- Ultra-low-latency requirements (< 50ms) where broker overhead is architecturally incompatible.
- Simple request/response integrations where event-driven complexity adds no value.
Prerequisites
- A mature enterprise messaging platform (Kafka, Azure Service Bus, AWS EventBridge, Pub/Sub).
- A schema registry capable of enforcing Avro, Protobuf, or JSON Schema evolution compatibility.
- Centralised secrets management for AI provider API keys.
- Observability platform capable of ingesting event-level metrics.
Industry Applicability
| Industry | Applicability | Primary Driver |
|---|---|---|
| Financial Services | High | Regulatory auditability, cost attribution, model risk governance |
| Government | High | Data classification enforcement, audit trail requirements |
| Healthcare | High | PHI data governance, model version traceability for clinical decisions |
| Retail / eCommerce | Medium | Multi-team AI consumption, cost management |
| Telecommunications | Medium | High-volume event streams, multi-domain AI use cases |
| Startups (< 50 engineers) | Low | Overhead exceeds benefit at this scale |
4. Architecture Overview
The Enterprise AI Service Bus is a layered event-driven architecture consisting of five logical planes: the ingestion plane, the governance plane, the routing plane, the processing plane, and the consumer plane.
Ingestion Plane. AI event producers — applications initiating AI inference requests — publish to the bus using an extended CloudEvents envelope. The CloudEvents 1.0 base fields (id, source, specversion, type, time, datacontenttype) are preserved intact. The AI extension fields are added as CloudEvents extension attributes: ai_model_id, ai_model_version, ai_prompt_version, ai_token_usage_prompt, ai_token_usage_completion, ai_confidence_score, ai_latency_ms, ai_cost_usd, ai_use_case_domain, ai_data_classification. Producers never call AI provider APIs directly. The AI SDK client library handles envelope construction, ensuring extension field completeness before the event is published.
Governance Plane. A policy enforcement processor subscribes to the raw inbound topic, validates the CloudEvents schema against the schema registry, applies data classification rules (blocking PII fields from reaching models not cleared for that classification), enforces per-producer rate limits, and re-publishes validated events to the routed topic. Failed validation events are routed to the governance dead letter queue with the specific violation reason attached. This plane is the single enforcement point for the enterprise AI usage policy.
Routing Plane. Topic design follows the domain-per-topic principle, not model-per-topic. Topics are named by business domain and event type: ai.creditrisk.application-assessment.v1, ai.customerservice.intent-classification.v1, ai.fraud.transaction-scoring.v1. This topology means upgrading the underlying model from GPT-4 to GPT-4o requires no change to topic names or consumer configurations — the model is a deployment detail of the AI inference worker, not an integration concern.
Processing Plane. AI inference workers subscribe to domain topics, execute inference against the configured model provider, and publish results to result topics following the same CloudEvents envelope pattern. The result event adds ai_result, ai_result_schema_version, and ai_fallback_used extension fields. Workers are stateless and horizontally scalable. Consumer group design ensures each logical consumer role (e.g., fraud-scorer, risk-ranker) receives every event independently without competing for the same partition offset.
Consumer Plane. Downstream applications subscribe to result topics. Consumers are shielded from model provider changes, prompt changes, and inference worker implementation details. The event schema version field enables consumers to handle multiple result schema versions concurrently during rolling upgrades.
Replay Architecture. All events — requests and results — are retained in compacted topics or object storage with a configurable retention period (recommended: 90 days for standard, 7 years for regulated use cases). Replay is initiated by re-publishing retained events to a replay topic. Replay events include the original id and a ai_replay_of extension field, enabling downstream deduplication and differentiation of original vs. replayed processing.
Back-Pressure Handling. AI inference is significantly slower than typical event processing (50ms–30s vs <1ms for simple transforms). Back-pressure is handled via consumer lag monitoring per consumer group: when lag exceeds the configured threshold, the auto-scaler adds inference worker instances. Hard rate limits per consumer group prevent a single workload from monopolising broker throughput.
Dead Letter Queue Architecture. Every consumer group has a corresponding DLQ topic. Events are routed to the DLQ after the configured maximum retry count with full event context preserved: original event, error message, retry count, last failure timestamp, and the consumer group that failed. DLQ topics are monitored; alerts fire at configurable message count thresholds. A replay-from-DLQ operator enables manual investigation and reprocessing.
5. Architecture Diagram
6. Components
| Component | Type | Responsibility | Technology Options | Criticality |
|---|---|---|---|---|
| AI SDK Client Library | Library | CloudEvents envelope construction, extension field population, publisher abstraction | Custom SDK (Python/Java/Node), Dapr SDK | Critical |
| Schema Registry | Infrastructure | Enforce event schema evolution compatibility; validate inbound events | Confluent Schema Registry, AWS Glue Schema Registry, Azure Schema Registry | Critical |
| Message Broker | Infrastructure | Durable topic management, consumer group offsets, replay retention | Apache Kafka, Azure Service Bus Premium, AWS MSK, Google Pub/Sub | Critical |
| Governance Processor | Service | Schema validation, data classification enforcement, rate limiting, governance DLQ routing | Kafka Streams app, Azure Stream Analytics, custom Flink job | Critical |
| AI Inference Worker | Service | Topic subscription, model provider API call, result event publication, retry logic | Containerised Python/Node service, AWS Lambda, Azure Functions | High |
| Dead Letter Queue Processor | Service | DLQ monitoring, alerting, manual replay tooling | Custom service + alerting integration | High |
| Event Archive | Storage | Long-term event retention for audit and replay | Kafka compacted topics + S3/ADLS/GCS, Apache Iceberg, Delta Lake | High |
| Replay Operator | Service | Re-publish archived events to inbound topic with replay metadata | Custom CLI/service | Medium |
| Observability Collector | Infrastructure | Consume all topics to extract cost, latency, quality metrics per domain | Kafka consumer + Prometheus metrics, Datadog, Splunk | High |
| Consumer Group Manager | Configuration | Define and enforce consumer group isolation across domains | Kafka AdminClient, Terraform-managed topic ACLs | Medium |
7. Data Flow
Primary Flow
| Step | Actor | Action | Output |
|---|---|---|---|
| 1 | Application | Calls AI SDK Client Library with domain payload and data classification label | CloudEvents envelope with AI extension fields populated |
| 2 | AI SDK | Publishes event to ai.raw.inbound.v1 topic |
Event persisted in broker with offset |
| 3 | Governance Processor | Validates event schema against registry; checks data classification vs model clearance; checks rate limit | Validated event forwarded to domain topic OR rejected to governance DLQ |
| 4 | AI Inference Worker | Subscribes to domain topic, receives event, constructs model provider API request | Model provider API call with prompt and context |
| 5 | Model Provider | Executes inference | AI response with token counts, finish reason |
| 6 | AI Inference Worker | Constructs result CloudEvent with ai_result, ai_confidence_score, ai_latency_ms, ai_cost_usd, ai_fallback_used |
Result event published to result topic |
| 7 | Consumer Application | Subscribes to result topic, processes AI result, updates business state | Business process continues with AI-enriched data |
| 8 | Event Archive | Subscribes to all topics; archives events to long-term storage | Immutable event log for audit and replay |
Error Flow
| Step | Error Condition | Detection | Recovery |
|---|---|---|---|
| 2 | Schema validation failure | Schema registry rejects event | Event routed to governance DLQ with violation detail |
| 3 | Data classification violation | Policy enforcer classification check fails | Event rejected to governance DLQ; producer alerted |
| 4 | Model provider API error (5xx) | HTTP error or timeout from provider | Retry with exponential backoff; after max retries, route to inference DLQ |
| 4 | Model provider rate limit (429) | HTTP 429 response | Back-off per Retry-After header; consumer group lag accumulates; auto-scaler adjusts |
| 6 | Result schema validation failure | Result event fails schema check | Worker logs error; original event moved to inference DLQ with error context |
| 7 | Consumer processing failure | Consumer throws exception after N retries | Consumer framework routes to consumer-group-specific DLQ |
8. Security Considerations
Authentication and Authorisation
- All producers authenticated to broker using mTLS client certificates or SASL/SCRAM.
- Topic ACLs enforced: each producer has write access only to
ai.raw.inbound.v1; each inference worker has read access only to its assigned domain topics. - AI provider API keys stored in centralised secrets manager (not in event payloads); injected into worker environment at runtime.
- Consumer applications have read-only ACL to their subscribed result topics only.
Secrets Management
- AI provider API keys rotated on a 90-day cycle; rotation does not require worker redeployment (secrets manager dynamic injection).
- Broker TLS certificates managed by PKI infrastructure with automated renewal.
- Schema registry credentials managed via service accounts with least-privilege access.
Data Classification
- All events tagged with data classification at source; governance processor enforces model clearance against classification.
- PII-tagged events are only routed to models with verified PII data processing agreements.
- Event payloads in transit encrypted (TLS 1.3); at rest encrypted (AES-256) in broker storage and event archive.
Auditability
- Every event carries a globally unique
id(UUID v4); the full audit trail from request to result is reconstructable by correlating onidandai_replay_of. - Governance DLQ events include the specific policy violation reason, enabling compliance reporting on rejected AI usage attempts.
OWASP LLM Top 10 Mitigations
| OWASP LLM Risk | Relevance | Mitigation in This Pattern |
|---|---|---|
| LLM01 — Prompt Injection | High | Governance processor validates event payload schema; free-text fields flagged for prompt injection scanning before routing to inference workers |
| LLM02 — Insecure Output Handling | High | Result events validated against result schema before publication; consumers receive structured, schema-typed fields not raw model output |
| LLM03 — Training Data Poisoning | Medium | Read-only audit trail of all training-relevant events; replay events flagged separately to prevent replay data polluting training pipelines |
| LLM04 — Model Denial of Service | High | Per-producer and per-consumer-group rate limits enforced by governance processor; cost spike circuit breaker triggers circuit open |
| LLM05 — Supply Chain Vulnerabilities | Medium | Model provider API calls go through inference workers only; SDK pinned versions in worker container images; SBOM generated per release |
| LLM06 — Sensitive Information Disclosure | High | Data classification enforcement prevents PII reaching uncertified models; no raw prompt or response stored in topics beyond configurable retention |
| LLM07 — Insecure Plugin Design | Medium | Function-calling plugins not applicable to this pattern; inference workers expose no external plugin surface |
| LLM08 — Excessive Agency | High | Inference workers are passive responders; no autonomous action capability; all results require consumer application to act |
| LLM09 — Overreliance | Medium | ai_confidence_score field in every result event; consumers can implement confidence thresholds before acting on AI results |
| LLM10 — Model Theft | Medium | API keys never in event payloads; model provider credentials not accessible to consumers; inference workers isolated in dedicated network segment |
9. Governance Considerations
Responsible AI
- Every AI inference event carries
ai_use_case_domainenabling post-hoc analysis of AI usage by domain against ethical use policies. - Confidence scores and model version in every result event support bias monitoring per domain over time.
- Human override mechanism: consumers can publish to
ai.[domain].human-override.v1topic to record cases where AI result was rejected by a human decision-maker.
Model Risk Management
- Schema registry enforces that breaking prompt changes result in a new
ai_prompt_versionvalue, enabling performance comparison between prompt versions using event analytics. - Model upgrade path: deploy new inference worker version subscribing to same domain topic; run shadow mode (dual-publish old and new results to separate result topics); compare result quality before cutover.
Human Approval Gates
- High-stakes domains (credit decisions, medical recommendations) configure a
requires_human_reviewflag in domain topic config; governance processor enriches events with this flag before routing to inference workers; result events includehuman_review_required: trueto trigger downstream approval workflow.
Policy and Traceability
- AI usage policy stored in policy-as-code repository; governance processor references versioned policy definitions; policy version embedded in governance validation result.
- Full event lineage from source application through governance validation through inference to consumer available via event
idcorrelation in the event archive.
Governance Artefacts
| Artefact | Owner | Update Frequency | Storage Location |
|---|---|---|---|
| AI Usage Policy (policy-as-code) | Chief AI Risk Officer | Per policy change | Policy repository (Git-backed) |
| Schema Registry Schemas | Platform Engineering | Per event schema change | Schema Registry + Git backup |
| Topic ACL Configuration | Platform Engineering | Per onboarding/offboarding | Terraform state + Git |
| DLQ Review Report | AI Governance Team | Weekly | Governance dashboard |
| Model Upgrade Decision Record | AI Platform Team | Per model version change | Architecture Decision Record repository |
| Cost Attribution Report | Finance / FinOps | Monthly | FinOps platform |
10. Operational Considerations
Monitoring and SLOs
| SLO | Target | Measurement | Alert Threshold |
|---|---|---|---|
| Event end-to-end latency (p99) | < 10s for async; < 500ms for near-real-time | Time from publish to result topic to consumer receipt | > 15s sustained for 5 min |
| Consumer group lag (all groups) | < 1000 events | Broker consumer lag metric | > 5000 events accumulating |
| Governance rejection rate | < 0.5% | DLQ event count / total events | > 2% in any 15-min window |
| Inference worker availability | 99.9% | Worker health check success rate | < 99.5% over 5 min |
| DLQ growth rate | 0 net new per hour (steady state) | DLQ message count delta | Any sustained growth |
| Event archive completeness | 100% | Archive record count vs broker offset | Any gap |
Logging
- Every governance processor decision logged with: event id, producer, domain, classification, policy version, decision (allow/reject), rejection reason.
- Every inference worker call logged with: event id, model provider, model id, prompt version, token usage, latency, cost, success/failure.
- Logs shipped to SIEM for security analysis; to observability platform for operational analysis.
Incident Response
- Governance processor failure: producers continue publishing to raw topic; events accumulate until processor recovers; no data loss (broker durability). Alert fires within 60 seconds of processor unavailability.
- Inference worker failure: domain topic consumer lag accumulates; auto-scaler adds new worker instances within 3 minutes; SLO breach alert if lag exceeds 5000 events.
- Model provider outage: circuit breaker opens after configured error rate threshold; fallback response or human queue escalation activated; incident ticket auto-created with cost-so-far and impacted domains.
Disaster Recovery
| Scenario | RTO | RPO | Recovery Procedure |
|---|---|---|---|
| Single inference worker failure | 3 minutes | 0 (broker retains events) | Auto-scaling replaces worker; consumer group resumes from last committed offset |
| Governance processor failure | 5 minutes | 0 | Kubernetes deployment restart; events accumulate in raw topic during outage |
| Broker node failure | 10 minutes | 0 (replicated partitions) | Kafka partition leader election; consumers reconnect automatically |
| Full broker cluster failure | 4 hours | 0 (cross-region replica) | Failover to replica cluster; update producer/consumer connection strings |
| Event archive corruption | 24 hours | Up to retention boundary | Restore from backup; replay from broker if within retention period |
Capacity Planning
- Broker storage: (average event size in KB) × (events per day) × (retention days) × 3 (replication factor).
- Inference worker sizing: target throughput (events/min) / per-worker throughput (events/min) = minimum worker count; add 50% headroom for burst.
- Schema registry: low resource requirements; size for HA (3-node ensemble) not throughput.
11. Cost Considerations
Cost Drivers
| Cost Driver | Description | Typical Proportion |
|---|---|---|
| AI Model Provider API Costs | Token-based charges for every inference event; dominant cost driver | 55–70% |
| Managed Broker (MSK/Service Bus) | Per-partition-hour + data transfer + storage | 10–20% |
| Inference Worker Compute | Container/function runtime for worker fleet | 8–15% |
| Event Archive Storage | Long-term event retention in object storage | 3–8% |
| Schema Registry | Managed service or self-hosted compute | 1–3% |
| Observability (metrics/logs) | Event-level metric ingestion volume | 3–7% |
Scaling Risks
- AI provider token costs scale linearly with event volume; cost spike protection requires cost-rate circuit breaker or monthly budget alerts.
- Kafka storage costs can grow unexpectedly with long retention periods on high-volume topics; topic-level retention policies must be actively managed.
- Inference worker auto-scaling lags behind sudden traffic spikes by 2–5 minutes; pre-warm workers for known batch jobs.
Cost Optimisations
- Batch small events into micro-batches in the inference worker to reduce per-call API overhead and take advantage of batch inference pricing.
- Use spot/preemptible instances for non-latency-sensitive inference workers (batch domains).
- Implement caching layer in inference worker for identical or near-identical prompts (semantic deduplication) — typical cache hit rate 15–30% for structured workloads.
- Compress event payloads (Snappy/LZ4 for Kafka) to reduce broker storage and network costs.
Indicative Cost Range
| Scale | Monthly Infrastructure | AI Provider API | Total Monthly |
|---|---|---|---|
| Small (10M events/mo, 3 domains) | $1,500–$3,000 | $5,000–$15,000 | $6,500–$18,000 |
| Medium (100M events/mo, 10 domains) | $8,000–$15,000 | $40,000–$120,000 | $48,000–$135,000 |
| Large (1B+ events/mo, 30+ domains) | $40,000–$80,000 | $300,000–$800,000 | $340,000–$880,000 |
12. Trade-Off Analysis
Architectural Options Comparison
| Option | Description | Latency | Cost | Governance | Complexity | Recommended For |
|---|---|---|---|---|---|---|
| Option A — Enterprise AI Service Bus (this pattern) | Asynchronous event bus with schema governance, domain topics, replay | 500ms–30s | Medium infrastructure + AI API | Centralised, strong | High | Large enterprise, regulated industries, multi-team AI consumption |
| Option B — Direct AI API Integration | Each application calls AI provider API directly | 100ms–10s | Low infrastructure, highest AI API | Decentralised, weak | Low | Single-team, exploratory, non-regulated |
| Option C — Synchronous AI Gateway | Synchronous API gateway proxying AI provider calls; no broker | 200ms–15s | Medium | Medium | Medium | Medium enterprise, request/response workloads, low replay requirement |
Architectural Tensions
| Tension | Trade-Off | Resolution |
|---|---|---|
| Latency vs. Governance | Adding governance processor to event path adds 50–200ms latency | Accept latency for regulated domains; implement fast-path bypass for pre-approved, non-sensitive use cases |
| Topic granularity vs. Consumer flexibility | Coarse domain topics couple unrelated use cases; fine-grained topics increase management overhead | One topic per domain AND event type version; avoid sub-domain splits until consumer count justifies it |
| Replay completeness vs. Storage cost | Full event retention enables unlimited replay; drives storage costs | Tiered retention: 90 days hot (broker), 7 years cold (object storage with restore latency) |
| Schema evolution rigidity vs. Innovation speed | Strict schema compatibility slows prompt experimentation | Use schema registry for result events (consumer-facing); allow looser schema for internal inference events behind the governance plane |
13. Failure Modes
| Failure | Likelihood | Impact | Detection | Recovery |
|---|---|---|---|---|
| Governance processor becomes unavailable | Low | High — all new events blocked from routing | Consumer lag on raw topic grows; health check fails | Kubernetes restart; events accumulate durably in broker |
| AI provider API key expires or is revoked | Medium | High — all inference workers fail | HTTP 401 errors from provider; inference DLQ growth | Rotate key in secrets manager; workers pick up automatically |
| Schema registry unavailable | Low | High — new events cannot be validated | Governance processor errors; alert fires | Read-through cache on governance processor provides short-term continuity; restore registry |
| Consumer group offset corruption | Very Low | Medium — some events may be reprocessed | Duplicate events in consumer application | Idempotent consumer processing (dedup on event id); replay from known-good offset |
| Back-pressure causing broker disk exhaustion | Medium | Critical — broker stops accepting new events | Broker disk usage alert | Increase broker storage; add topic retention policy enforcement; throttle producers |
| Model provider rate limit hit | High | Medium — inference latency increases | HTTP 429 responses; consumer lag growth | Exponential backoff; distribute load across multiple provider API keys; activate fallback model |
Cascading Failure Scenarios
- Governance processor failure + high event volume: Raw topic fills beyond retention period → events lost. Mitigation: extend raw topic retention to 7 days; alert on raw topic consumer lag within 60 seconds.
- Inference DLQ accumulation + no DLQ monitoring: Silent event loss for hours; downstream consumers starved of results, triggering application-level failures. Mitigation: DLQ monitoring and alerting is mandatory, not optional.
- Model provider global outage + no circuit breaker + no fallback: All inference workers retry indefinitely → exhausts retry budget → all events land in DLQ → consumers receive no results → downstream business processes halt. Mitigation: circuit breaker with fallback response is non-negotiable for production deployments.
14. Regulatory Considerations
APRA CPS 230 — Operational Risk
- Clause 36 (Business Continuity): The event bus must have documented RTO/RPO for each failure scenario. Replay capability directly addresses recovery of AI processing after outages.
- Clause 52 (Service Provider Management): AI model providers are third-party service providers; the governance processor enforces usage controls required under third-party risk management.
APRA CPS 234 — Information Security
- Clause 15 (Information Security Controls): mTLS authentication, topic ACLs, and data classification enforcement address the requirement for controls proportional to data sensitivity.
- Clause 36 (Incident Notification): Governance DLQ violations and model provider outages must be assessed as potential security incidents under CPS 234 notification obligations.
Australian Privacy Act 1988 (as amended 2024)
- APP 6 (Use and Disclosure): Data classification enforcement in the governance processor operationalises the requirement to use personal information only for the primary purpose disclosed at collection.
- APP 8 (Cross-border Disclosure): Events routed to offshore model providers must have the country of processing recorded in the AI extension fields; governance processor must block cross-border routing for events exceeding permitted data sharing boundaries.
EU AI Act (2024)
- Article 13 (Transparency):
ai_model_id,ai_model_version, andai_prompt_versionin every event satisfy the requirement to document the AI system used in automated decisions affecting natural persons. - Article 17 (Quality Management): Schema registry enforcement, DLQ monitoring, and replay capability are evidence of a quality management system for AI outputs.
- Article 12 (Record-keeping): Event archive with 7-year retention for high-risk AI use cases directly satisfies the logging obligation for high-risk AI systems.
ISO 42001 — AI Management System
- Clause 6.1.2 (AI Risk Assessment): Per-domain circuit breakers and confidence score tracking operationalise the risk assessment and monitoring requirements.
- Clause 8.5 (AI System Lifecycle): Prompt versioning, model version tracking, and replay capability support the AI lifecycle management requirements.
NIST AI RMF (2023)
- GOVERN 1.1: AI usage policy encoded in governance processor addresses the organisational risk governance requirement.
- MEASURE 2.5: Confidence score monitoring and quality degradation circuit breaker conditions implement the performance measurement requirement.
- MANAGE 2.4: DLQ with full context capture and replay capability addresses the AI risk treatment and incident response requirements.
15. Reference Implementations
AWS
- Broker: Amazon MSK (Kafka-compatible) with MSK Connect for governance processor
- Schema Registry: AWS Glue Schema Registry
- Inference Workers: AWS Lambda (event-driven) or ECS Fargate containers
- DLQ: Amazon SQS DLQ connected to MSK via Kafka SQS Sink Connector
- Event Archive: S3 via Kafka S3 Sink Connector; query via Athena
- Observability: Amazon CloudWatch + AWS Cost Explorer for per-event cost tracking
- Secrets: AWS Secrets Manager with Lambda execution role access
Azure
- Broker: Azure Event Hubs (Kafka-compatible surface) or Azure Service Bus Premium
- Schema Registry: Azure Schema Registry (built into Event Hubs namespace)
- Inference Workers: Azure Functions (event-driven triggers) or AKS pods
- DLQ: Azure Service Bus dead-letter queues
- Event Archive: Azure Data Lake Storage Gen2 via Event Hubs Capture
- Observability: Azure Monitor + Application Insights; Cost Management for attribution
- Secrets: Azure Key Vault with managed identity binding to workers
GCP
- Broker: Google Cloud Pub/Sub (native) or GKE-hosted Kafka
- Schema Registry: Confluent Schema Registry on GKE or Apicurio Registry
- Inference Workers: Cloud Run (event-driven) or GKE deployments
- DLQ: Pub/Sub dead-letter topics with subscription-level configuration
- Event Archive: Cloud Storage via Pub/Sub export; query via BigQuery external tables
- Observability: Cloud Monitoring + Cloud Logging; BigQuery for cost analytics
- Secrets: Secret Manager with Workload Identity binding
On-Premises / Private Cloud
- Broker: Apache Kafka (self-managed) on Kubernetes via Strimzi Operator
- Schema Registry: Confluent Schema Registry OSS or Apicurio Registry
- Inference Workers: Kubernetes Deployments with KEDA event-driven autoscaling
- DLQ: Dedicated Kafka topics with Kafka UI for manual review
- Event Archive: MinIO (S3-compatible) + Apache Iceberg for query
- Observability: Prometheus + Grafana + Loki stack
- Secrets: HashiCorp Vault with Kubernetes auth method
16. Related Patterns
| Pattern | Relationship | Notes |
|---|---|---|
| EAAPL-INT007 — AI Circuit Breaker | Enables | Circuit breaker per model provider is a required sub-component of each inference worker in this pattern |
| EAAPL-INT004 — Real-Time AI Stream Processing | Specialises | Stream processing pattern is a specific consumer topology for this bus in low-latency domains |
| EAAPL-INT005 — Batch AI Processing | Specialises | Batch processing is a consumer topology for this bus in high-throughput, non-latency-sensitive domains |
| EAAPL-INT002 — Legacy System AI Augmentation | Complementary | Legacy systems publish to and consume from this bus through adapter components |
| EAAPL-INT008 — Bidirectional AI Sync | Complementary | Sync pattern consumes result events from this bus to update enterprise data stores |
17. Maturity Assessment
Overall Maturity: Proven
| Dimension | Score (1–5) | Justification |
|---|---|---|
| Architectural Completeness | 5 | All integration, governance, processing, and consumer concerns addressed |
| Operational Readiness | 4 | Runbook templates defined; some DR procedures require organisation-specific customisation |
| Security Coverage | 5 | mTLS, ACLs, classification enforcement, OWASP LLM Top 10 addressed |
| Governance Coverage | 5 | Policy-as-code, audit trail, model risk management, human override all included |
| Cost Predictability | 4 | Indicative ranges provided; AI API costs remain variable; budget alerting required |
| Implementation Complexity | 3 | High — requires mature messaging platform and operational tooling; not suitable for small teams |
| Industry Validation | 4 | Pattern applied in production at major financial institutions and government agencies |
18. Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-06-12 | EAAPL Working Group | Initial publication — integration patterns series |