EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryPlatform EngineeringEAAPL-PLT007
EAAPL-PLT007Proven
⇄ Compare

Multi-Tenant AI Platform

⚙️ Platform EngineeringEU AI ActPrivacy Act

[EAAPL-PLT007] Multi-Tenant AI Platform

Category: Platform Engineering Sub-category: Multi-Tenancy / SaaS Platform Version: 1.1 Maturity: Proven Tags: multi-tenancy, tenant-isolation, data-isolation, cost-isolation, chargeback, tenant-onboarding, vector-store-isolation, policy-enforcement Regulatory Relevance: APRA CPS 234, Privacy Act / GDPR (data separation), EU AI Act Article 9, ISO 27001 A.9


1. Executive Summary

When an AI platform serves multiple tenants—whether internal business units within an enterprise or external customers of an AI-powered SaaS product—tenant isolation becomes a foundational engineering and compliance requirement. The failure mode is severe: a cross-tenant data leak from the AI system exposes confidential information to a competitor, breaches privacy legislation, and destroys trust simultaneously.

The Multi-Tenant AI Platform pattern establishes a rigorous isolation architecture spanning compute, data (including vector stores for RAG), cost attribution, and policy enforcement. It defines three isolation tiers—shared pool, dedicated namespace, and dedicated compute—each with different isolation strength, cost profile, and operational complexity, so platform architects can select the right tier for each tenant category. The pattern also provides an automated tenant onboarding framework that provisions all isolation controls consistently without platform team manual intervention, removing the risk of human error in security configuration. For enterprises and SaaS providers operating AI at scale, this pattern is the foundation upon which regulatory compliance and commercial trust are built.


2. Problem Statement

Business Problem

Enterprise business units and external customers expect their AI interactions, documents, and derived insights to be completely isolated from other tenants. A financial services platform where Client A's AI-processed documents are accessible to Client B's queries is a regulatory breach and a liability event. An internal enterprise platform where one business unit's AI queries reveal another's strategic data is a governance failure.

Technical Problem

AI platform components are inherently multi-tenant targets: vector stores index content from multiple tenants; caches may serve responses across tenant boundaries; model endpoints are shared compute. Without explicit isolation controls at each layer, cross-tenant data leakage is possible through RAG retrieval (returning documents from the wrong tenant), cache poisoning (serving a cached response from one tenant to another), or cost attribution errors (one tenant's usage charged to another).

Symptoms

  • Vector store queries returning documents without tenant scoping (all-tenant search)
  • No per-tenant rate limits or cost attribution; tenants experience noisy neighbour effects
  • Tenant onboarding is a manual process requiring platform team involvement; inconsistent security configuration
  • No per-tenant policy enforcement; one data classification policy applied globally
  • Cost management treats all tenants as a single entity; chargeback impossible

Cost of Inaction

  • Cross-tenant data leakage incident triggering regulatory investigation, breach notification, and reputational damage
  • Noisy neighbour compute contention causing SLA breach for premium tenants
  • Inability to operate AI as a SaaS product without demonstrable tenant isolation
  • Manual onboarding taking 2–4 weeks per tenant, limiting growth velocity

3. Context

When to Apply

  • Platform serves ≥2 tenants with confidentiality requirements between them
  • External-facing SaaS product built on AI capabilities with customer data isolation requirements
  • Internal enterprise platform with compliance obligation to separate BU data (e.g., different regulatory jurisdictions)
  • Chargeback or billing by tenant is required
  • Tenants have different policy requirements (e.g., some tenants have more restrictive data classification rules)

When NOT to Apply

  • Single tenant with no isolation requirement: overhead not warranted
  • Fully federated model where each tenant operates their own platform instance: tenant isolation is achieved at infrastructure level, not within a shared platform
  • Proof-of-concept: implement single-tenant first; multi-tenancy retrofit is always harder than building it in

Prerequisites

  • Identity provider with tenant-scoped claims in JWTs (tenant ID in token)
  • AI API Gateway (PLT002) as enforcement point for tenant context
  • Vector store that supports namespace or metadata filtering (required for RAG isolation)
  • Cost management service capable of per-tenant attribution
  • Infrastructure-as-code for automated tenant provisioning

Industry Applicability

Industry Applicability Isolation Driver
SaaS (B2B) Critical Customer data separation; contractual obligation
Financial Services Very High Regulatory data separation; competing client isolation
Healthcare Very High Patient data; HIPAA/Privacy Act separation requirements
Legal Services High Client privilege; competitive intelligence
Enterprise Internal High BU confidentiality; regulatory jurisdiction separation
Government High Security classification; agency separation

4. Architecture Overview

The Multi-Tenant AI Platform addresses isolation across five dimensions: identity, compute, data, cost, and policy. Each dimension has multiple isolation tier options, and the appropriate combination of tiers per tenant type drives the overall architecture.

Tenant Identity and Context Propagation establishes the tenant as a first-class citizen in all platform components. Every authenticated request carries a tenant ID claim in its JWT (or API key metadata). This tenant ID is extracted by the AI API Gateway at authentication time and propagated as a context header (X-Tenant-ID) to all downstream platform services. No downstream service makes a data access decision without this tenant context; it is the universal key for isolation enforcement. The tenant registry (a simple lookup table mapping tenant IDs to isolation tier, policy profile, and resource configuration) is consulted by the gateway on every request.

Compute Isolation Tiers define the strength of isolation at the inference layer. The Shared Pool tier routes all tenants to the same model endpoint pool; isolation is enforced through request-level context and response handling only. This is appropriate for internal business units with a trust relationship. The Dedicated Namespace tier gives each tenant a logically separate API endpoint with dedicated rate limits and circuit breakers, but sharing underlying model compute. This is appropriate for external customers with contractual SLA requirements. The Dedicated Compute tier provisions separate model serving infrastructure per tenant; this is the strongest isolation, appropriate for high-value customers or regulatory requirements (e.g., financial data processing where the model server must be in the customer's own environment).

Data Isolation — Vector Store is the most technically complex isolation requirement. When tenants use RAG (retrieval-augmented generation), documents from different tenants must never appear in each other's retrieval results. This is achieved through namespace isolation (each tenant has a dedicated namespace or collection in the vector store; queries are scoped to the tenant's namespace by the platform, not by the application code) or metadata filtering (all documents are stored in a shared collection with a tenant_id metadata field; every query includes a mandatory filter on this field that the platform injects and that cannot be removed by the application). The namespace approach is stronger (physical isolation; no risk of filter bypass); metadata filtering is simpler but requires the filter injection to be implemented correctly and consistently at every query site.

Cost Isolation and Chargeback provides financial accountability. The cost management service (PLT004) maintains per-tenant token consumption counters. Each tenant has a configured budget with independent warning and blocking thresholds. Billing for external SaaS tenants is generated from the cost event stream, filtered by tenant ID, and integrated with the billing system (Stripe, Zuora) via a reconciliation job. Internal chargeback reports aggregate monthly spend by tenant/BU for finance system integration.

Per-Tenant Policy Enforcement allows tenants to have different data classification rules, model access policies, and output filtering requirements. The policy engine (OPA) evaluates policies against a composite context including both the platform's global policies and the tenant's specific overrides. This means a tenant operating in the EU can have stricter data residency rules than a tenant in another jurisdiction, without requiring separate platform infrastructure.

Tenant Onboarding Automation is the operational mechanism that makes multi-tenancy scalable. A tenant onboarding workflow (implemented as an Infrastructure as Code template or an internal workflow service) provisions all isolation resources consistently in a single automated run: creates the tenant namespace in the vector store, creates the rate limit quota in Redis, creates the tenant record in the tenant registry, creates the API key in the secrets manager, creates the policy profile in OPA, creates the cost budget in the cost management service, and sends the onboarding confirmation to the tenant. This process must be idempotent (safe to re-run) and audited (every provisioning action logged).


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Ingress["Tenant Ingress"] A[Tenant Request] B[AI API Gateway] end subgraph Control["Control Plane"] C[Tenant Registry] D{Policy Engine} E[Onboarding Workflow] end subgraph Isolation["Isolation Layer"] F[Compute Tier] G[Vector Namespace] H[Cost Counter] end A -->|JWT with tenant_id| B B -->|lookup tier + policy| C C --> D D -->|allowed| F D -->|scoped query| G D -->|emit event| H E -->|provisions| G E -->|provisions| H H -->|chargeback| B style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#fef9c3,stroke:#eab308 style D fill:#f3e8ff,stroke:#a855f7 style E fill:#f0fdf4,stroke:#22c55e style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308 style H fill:#fef9c3,stroke:#eab308

6. Components

Component Type Responsibility Technology Options Criticality
Tenant Registry Service Map tenant ID to isolation tier, policy profile, resource config PostgreSQL, DynamoDB, Redis hash Critical
Tenant Context Injector Middleware Extract tenant ID from JWT; inject as context header Custom gateway middleware Critical
Namespace Injector (Vector Store) Service Force tenant namespace scope on all vector store queries Custom query interceptor layer Critical
Per-Tenant Rate Limiter Service Maintain separate rate limit quota per tenant Redis with tenant-namespaced keys Critical
Per-Tenant Cost Counter Service Maintain token consumption and budget per tenant Redis sorted sets, custom Critical
Per-Tenant Policy Profile Configuration Tenant-specific OPA policy overrides OPA policy bundles with tenant namespace High
Tenant Onboarding Workflow Service Automated provisioning of all tenant isolation resources Terraform + custom provisioner, Temporal workflow High
Vector Store (Namespace-Aware) Service Partition documents by tenant namespace Qdrant collections, Weaviate multi-tenancy, pgvector + RLS Critical
Billing Integration Service Reconcile per-tenant cost events with billing system Custom + Stripe/Zuora integration High (SaaS)
Tenant Admin Portal Service Self-service configuration for tenant administrators Custom, Backstage tenant plugin Medium

7. Data Flow

Primary Flow — Tenant-Scoped RAG Request

Step Actor Action Output
1 Tenant B Application POST /v1/chat/completions with JWT containing tenant_id: tenant-b Request at gateway
2 Gateway Authentication Validate JWT; extract tenant_id: tenant-b Tenant context established
3 Tenant Registry Lookup Retrieve Tenant B profile: tier=dedicated-namespace, vector-ns=tenant-b, rate-limit=50K/min, policy-profile=tenant-b-eu Tenant configuration loaded
4 OPA Policy Evaluation Evaluate composite policy: global + tenant-b-eu overrides; request classification INTERNAL → approved for mid-tier model Policy allow
5 Rate Limit Check Check Tenant B's rate limit counter (namespace: ratelimit:tenant-b); 40K of 50K used → allow Request proceeds
6 Vector Store Query Application requests RAG retrieval; namespace injector forces namespace=tenant-b on query Only Tenant B documents retrieved; Tenant A and C documents inaccessible
7 Model Inference Request forwarded to Tenant B's dedicated namespace endpoint; response generated Model response
8 Cost Attribution Emit cost event with tenant_id=tenant-b; update counter:tenant-b in Redis Tenant B cost updated
9 Response Return response with X-Tenant-ID: tenant-b response header Tenant B application receives response

Error Flow

Error Detection Response
Tenant ID not in registry Registry lookup miss 403 with unknown-tenant code; onboarding required
Tenant namespace not initialised in vector store Namespace injector error 503 with tenant-setup-incomplete code; alert platform team
Tenant budget exhausted Budget counter check 429 with tenant-budget-exhausted code; tenant admin notified
Cross-tenant namespace injection attempt (application tries to override namespace) Namespace injector override detection 403; security alert raised

8. Security Considerations

Tenant Isolation Enforcement

  • Tenant ID in JWT must be a non-forgeable claim signed by the trusted IdP; tenant ID must never be accepted from request body or URL parameters
  • Namespace injection into vector store queries must occur at the platform layer, not application layer; applications must not be trusted to correctly scope their own queries
  • Zero-trust principle: every request is re-evaluated for tenant context regardless of previous requests in the same session

Data Separation

  • Vector store namespaces use hard partitioning (separate collections in Qdrant, separate schemas in pgvector with RLS) rather than soft filtering where possible
  • Cache entries are always prefixed with tenant namespace ({tenant_id}:{cache_key}); Redis ACLs enforce per-namespace access
  • Audit logs include tenant ID on every record; security monitoring tracks cross-tenant anomalies

OWASP LLM Top 10

OWASP LLM Risk Multi-Tenant Control
LLM06 Sensitive Information Disclosure Vector store namespace isolation prevents cross-tenant RAG retrieval; cache scoping prevents cross-tenant cache hits
LLM04 Model DoS Per-tenant rate limits prevent one tenant exhausting platform capacity
LLM08 Excessive Agency Per-tenant policy profiles control what actions AI can take on behalf of each tenant

9. Governance Considerations

Tenant Data Governance

  • Each tenant's data in the vector store is subject to the tenant's own data governance policies; the platform provides isolation infrastructure but does not govern tenant data content
  • Platform retains logs of all access to tenant namespaces; these logs are available to tenants as part of their audit rights

Onboarding Governance

  • All tenant onboarding must be performed via the automated workflow; manual provisioning is prohibited as it bypasses audit and consistency controls
  • Tenant offboarding (data deletion) must be automated and audited; all tenant namespaces and cost records purged within the agreed retention period

Governance Artefacts

Artefact Owner Cadence Location
Tenant registry Platform Team Continuous Database + IaC
Tenant isolation tier policy Platform Governance Board Annual Platform policy document
Tenant onboarding audit log Platform Team Per event Audit log
Cross-tenant isolation test results Platform Team + Security Quarterly Security test repository
Tenant data residency configuration Data Governance Team Per tenant Tenant registry

10. Operational Considerations

Monitoring

Signal Source Alert Threshold Owner
Cross-tenant access attempt Namespace injector security alert Any attempt CISO + Platform On-Call
Tenant namespace not found in vector store RAG query error Any 503 with tenant-setup-incomplete Platform On-Call
Per-tenant error rate spike Tenant-scoped gateway metrics >5% error rate for any single tenant Platform On-Call
Tenant onboarding workflow failure Workflow status Any failure Platform Team

SLOs

SLO Target Window
Tenant onboarding automated completion <15 minutes from trigger Per event
Tenant isolation verification (no cross-tenant data) Zero incidents Continuous
Per-tenant gateway availability 99.9% (shared tier), 99.95% (dedicated) Rolling 30 days

Disaster Recovery

Component RPO RTO Strategy
Tenant registry 5 min 15 min Database replication
Vector store (per tenant namespace) 1 hour 30 min Namespace backup + restore; re-indexing if needed
Per-tenant rate limit state 5 min 5 min Redis Sentinel; brief over-limit window acceptable

11. Cost Considerations

Cost Drivers

Driver Description Relative Weight
Dedicated compute per tenant (Tier 3) Separate GPU instances per tenant Very High — only warranted for high-value/regulated tenants
Vector store storage per tenant Proportional to tenant document volume Medium
Per-tenant Redis namespaces Small memory overhead per tenant Low
Onboarding automation infrastructure Fixed cost; amortised across tenants Very Low

Indicative Cost Range

Isolation Tier Monthly Infra Cost Per Tenant Notes
Shared Pool $20–$100 overhead Amortised over all tenants
Dedicated Namespace $200–$800 Dedicated queue + vector namespace + rate limit
Dedicated Compute $3,000–$15,000+ Separate GPU inference infrastructure

12. Trade-Off Analysis

Isolation Tier Selection

Tier Isolation Strength Cost Compliance Suitability Best For
Shared Pool Low (logical isolation only) Lowest Internal BUs with trust relationship Internal enterprise teams; low-risk use cases
Dedicated Namespace Medium (software isolation + dedicated resources) Medium Most external customers; standard regulatory requirements B2B SaaS; typical financial services customers
Dedicated Compute High (infrastructure isolation) High High-risk regulated; customers requiring data processing agreement with compute isolation Healthcare providers; high-security government; regulated financial institutions

Vector Store Isolation Options

Option Description Isolation Strength Pros Cons
Separate Collections (Qdrant) Each tenant has own collection Strongest Physical separation; no filter bypass Higher per-tenant overhead
Metadata Filtering (single collection) Shared collection with tenant_id filter injected Medium Lower overhead; easier management Filter injection must be airtight; harder to audit
Row-Level Security (pgvector) Database RLS on shared table Strong Database-enforced; audit trail PostgreSQL expertise required; query performance at scale

Architectural Tensions

Tension Option A Option B Resolution
Isolation strength vs. cost efficiency Maximum isolation for all tenants Minimum isolation Tiered model; customers choose/pay for isolation strength
Onboarding speed vs. isolation configuration review Fully automated self-service Manual review each tenant Automated for standard tier; manual review only for dedicated compute
Per-tenant customisation vs. platform consistency Full per-tenant configuration Standardised for all Parameterised standard templates; tenant overrides within guardrails

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Namespace injection bug (cross-tenant data access) Very Low Critical Security testing; anomaly detection Immediate incident; audit all affected queries; mandatory pen-test
Tenant registry unavailable Low High — all tenant context lookups fail Health check Serve cached tenant context; alert platform
Dedicated compute instance failure Medium High — that tenant's AI features unavailable Health check; circuit breaker Fail over to shared pool with tenant consent; spin up replacement
Onboarding workflow partial failure Medium Medium — tenant incompletely provisioned Workflow status monitoring Re-run idempotent workflow; alert platform team
Vector namespace quota exhaustion Low Medium — tenant cannot index new documents Storage metrics Alert tenant admin; increase quota

14. Regulatory Considerations

Privacy Act / GDPR

  • Tenant namespaces in vector stores constitute separate data processing environments; the platform operator acts as a data processor and must ensure isolation of controllers' data
  • Tenant offboarding must include complete data deletion from all namespaces; automated deletion with audit log satisfies data erasure obligations

APRA CPS 234

  • Multi-tenant isolation controls are information security capabilities that must be maintained per CPS 234 paragraph 36
  • Quarterly cross-tenant penetration tests are required to verify isolation controls as CPS 234 operational resilience evidence

EU AI Act Article 9

  • Per-tenant risk management configurations allow different risk profiles per customer, satisfying Article 9's requirement for context-specific risk management systems

15. Reference Implementations

AWS

Component AWS Service
Tenant registry Amazon DynamoDB
Vector store (namespace-aware) Amazon OpenSearch (separate indices per tenant) or pgvector with RLS on RDS
Per-tenant rate limits ElastiCache Redis with tenant-namespaced keys
Tenant onboarding automation AWS Step Functions + CDK
Compute isolation (dedicated) Separate SageMaker endpoints per tenant

Azure

Component Azure Service
Tenant registry Azure Cosmos DB
Vector store Azure AI Search (separate indexes per tenant)
Compute isolation Separate Azure OpenAI deployments per tenant
Onboarding automation Azure Logic Apps + ARM/Bicep

On-Premises

Component Technology
Tenant registry PostgreSQL
Vector store Qdrant multi-tenant collections
Onboarding automation Terraform + custom Python provisioner
Rate limits Redis Enterprise with ACL per tenant

Pattern ID Name Relationship
EAAPL-PLT001 Enterprise AI Platform Parent — multi-tenancy is a specialisation of the platform
EAAPL-PLT002 AI API Gateway Host — tenant context injected and enforced at gateway
EAAPL-PLT004 LLM Cost Control Extension — cost control extended per-tenant
EAAPL-RAG001 RAG Architecture Dependency — vector store isolation critical for RAG multi-tenancy

17. Maturity Assessment

Overall Maturity: Proven Multi-tenant AI platforms are in production at SaaS companies and enterprises. Qdrant native multi-tenancy, pgvector RLS, and Redis namespace isolation are all production-ready. Automated onboarding is the variable factor.

Scoring Matrix

Dimension Score (1–5) Rationale
Pattern Completeness 5 All sections documented
Implementation Evidence 4 Widely deployed; vector store isolation approach varies
Security Rigor 5 Isolation controls comprehensive; penetration test guidance included
Tooling Maturity 4 Qdrant/pgvector multi-tenancy mature; onboarding automation custom per deployment
Regulatory Alignment 5 Privacy Act, GDPR, APRA CPS 234, EU AI Act all addressed

18. Revision History

Version Date Author Changes
1.0 2024-07-01 EAAPL Working Group Initial publication
1.1 2025-06-12 EAAPL Working Group Dedicated compute tier added; pgvector RLS option documented; onboarding automation expanded
← Back to LibraryMore Platform Engineering