EAAPL-PLT007Proven

Multi-Tenant AI Platform

⚙️ Platform EngineeringEU AI ActPrivacy Act

[EAAPL-PLT007] Multi-Tenant AI Platform

Category: Platform Engineering Sub-category: Multi-Tenancy / SaaS Platform Version: 1.1 Maturity: Proven Tags: multi-tenancy, tenant-isolation, data-isolation, cost-isolation, chargeback, tenant-onboarding, vector-store-isolation, policy-enforcement Regulatory Relevance: APRA CPS 234, Privacy Act / GDPR (data separation), EU AI Act Article 9, ISO 27001 A.9

1. Executive Summary

When an AI platform serves multiple tenants—whether internal business units within an enterprise or external customers of an AI-powered SaaS product—tenant isolation becomes a foundational engineering and compliance requirement. The failure mode is severe: a cross-tenant data leak from the AI system exposes confidential information to a competitor, breaches privacy legislation, and destroys trust simultaneously.

The Multi-Tenant AI Platform pattern establishes a rigorous isolation architecture spanning compute, data (including vector stores for RAG), cost attribution, and policy enforcement. It defines three isolation tiers—shared pool, dedicated namespace, and dedicated compute—each with different isolation strength, cost profile, and operational complexity, so platform architects can select the right tier for each tenant category. The pattern also provides an automated tenant onboarding framework that provisions all isolation controls consistently without platform team manual intervention, removing the risk of human error in security configuration. For enterprises and SaaS providers operating AI at scale, this pattern is the foundation upon which regulatory compliance and commercial trust are built.

2. Problem Statement

Business Problem

Enterprise business units and external customers expect their AI interactions, documents, and derived insights to be completely isolated from other tenants. A financial services platform where Client A's AI-processed documents are accessible to Client B's queries is a regulatory breach and a liability event. An internal enterprise platform where one business unit's AI queries reveal another's strategic data is a governance failure.

Technical Problem

AI platform components are inherently multi-tenant targets: vector stores index content from multiple tenants; caches may serve responses across tenant boundaries; model endpoints are shared compute. Without explicit isolation controls at each layer, cross-tenant data leakage is possible through RAG retrieval (returning documents from the wrong tenant), cache poisoning (serving a cached response from one tenant to another), or cost attribution errors (one tenant's usage charged to another).

Symptoms

Vector store queries returning documents without tenant scoping (all-tenant search)
No per-tenant rate limits or cost attribution; tenants experience noisy neighbour effects
Tenant onboarding is a manual process requiring platform team involvement; inconsistent security configuration
No per-tenant policy enforcement; one data classification policy applied globally
Cost management treats all tenants as a single entity; chargeback impossible

Cost of Inaction

Cross-tenant data leakage incident triggering regulatory investigation, breach notification, and reputational damage
Noisy neighbour compute contention causing SLA breach for premium tenants
Inability to operate AI as a SaaS product without demonstrable tenant isolation
Manual onboarding taking 2–4 weeks per tenant, limiting growth velocity

3. Context

When to Apply

Platform serves ≥2 tenants with confidentiality requirements between them
External-facing SaaS product built on AI capabilities with customer data isolation requirements
Internal enterprise platform with compliance obligation to separate BU data (e.g., different regulatory jurisdictions)
Chargeback or billing by tenant is required
Tenants have different policy requirements (e.g., some tenants have more restrictive data classification rules)

When NOT to Apply

Single tenant with no isolation requirement: overhead not warranted
Fully federated model where each tenant operates their own platform instance: tenant isolation is achieved at infrastructure level, not within a shared platform
Proof-of-concept: implement single-tenant first; multi-tenancy retrofit is always harder than building it in

Prerequisites

Identity provider with tenant-scoped claims in JWTs (tenant ID in token)
AI API Gateway (PLT002) as enforcement point for tenant context
Vector store that supports namespace or metadata filtering (required for RAG isolation)
Cost management service capable of per-tenant attribution
Infrastructure-as-code for automated tenant provisioning

Industry Applicability

Industry	Applicability	Isolation Driver
SaaS (B2B)	Critical	Customer data separation; contractual obligation
Financial Services	Very High	Regulatory data separation; competing client isolation
Healthcare	Very High	Patient data; HIPAA/Privacy Act separation requirements
Legal Services	High	Client privilege; competitive intelligence
Enterprise Internal	High	BU confidentiality; regulatory jurisdiction separation
Government	High	Security classification; agency separation

4. Architecture Overview

The Multi-Tenant AI Platform addresses isolation across five dimensions: identity, compute, data, cost, and policy. Each dimension has multiple isolation tier options, and the appropriate combination of tiers per tenant type drives the overall architecture.

Tenant Identity and Context Propagation establishes the tenant as a first-class citizen in all platform components. Every authenticated request carries a tenant ID claim in its JWT (or API key metadata). This tenant ID is extracted by the AI API Gateway at authentication time and propagated as a context header (X-Tenant-ID) to all downstream platform services. No downstream service makes a data access decision without this tenant context; it is the universal key for isolation enforcement. The tenant registry (a simple lookup table mapping tenant IDs to isolation tier, policy profile, and resource configuration) is consulted by the gateway on every request.

Compute Isolation Tiers define the strength of isolation at the inference layer. The Shared Pool tier routes all tenants to the same model endpoint pool; isolation is enforced through request-level context and response handling only. This is appropriate for internal business units with a trust relationship. The Dedicated Namespace tier gives each tenant a logically separate API endpoint with dedicated rate limits and circuit breakers, but sharing underlying model compute. This is appropriate for external customers with contractual SLA requirements. The Dedicated Compute tier provisions separate model serving infrastructure per tenant; this is the strongest isolation, appropriate for high-value customers or regulatory requirements (e.g., financial data processing where the model server must be in the customer's own environment).

Data Isolation — Vector Store is the most technically complex isolation requirement. When tenants use RAG (retrieval-augmented generation), documents from different tenants must never appear in each other's retrieval results. This is achieved through namespace isolation (each tenant has a dedicated namespace or collection in the vector store; queries are scoped to the tenant's namespace by the platform, not by the application code) or metadata filtering (all documents are stored in a shared collection with a tenant_id metadata field; every query includes a mandatory filter on this field that the platform injects and that cannot be removed by the application). The namespace approach is stronger (physical isolation; no risk of filter bypass); metadata filtering is simpler but requires the filter injection to be implemented correctly and consistently at every query site.

Cost Isolation and Chargeback provides financial accountability. The cost management service (PLT004) maintains per-tenant token consumption counters. Each tenant has a configured budget with independent warning and blocking thresholds. Billing for external SaaS tenants is generated from the cost event stream, filtered by tenant ID, and integrated with the billing system (Stripe, Zuora) via a reconciliation job. Internal chargeback reports aggregate monthly spend by tenant/BU for finance system integration.

Per-Tenant Policy Enforcement allows tenants to have different data classification rules, model access policies, and output filtering requirements. The policy engine (OPA) evaluates policies against a composite context including both the platform's global policies and the tenant's specific overrides. This means a tenant operating in the EU can have stricter data residency rules than a tenant in another jurisdiction, without requiring separate platform infrastructure.

Tenant Onboarding Automation is the operational mechanism that makes multi-tenancy scalable. A tenant onboarding workflow (implemented as an Infrastructure as Code template or an internal workflow service) provisions all isolation resources consistently in a single automated run: creates the tenant namespace in the vector store, creates the rate limit quota in Redis, creates the tenant record in the tenant registry, creates the API key in the secrets manager, creates the policy profile in OPA, creates the cost budget in the cost management service, and sends the onboarding confirmation to the tenant. This process must be idempotent (safe to re-run) and audited (every provisioning action logged).

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Ingress["Tenant Ingress"] A[Tenant Request] B[AI API Gateway] end subgraph Control["Control Plane"] C[Tenant Registry] D{Policy Engine} E[Onboarding Workflow] end subgraph Isolation["Isolation Layer"] F[Compute Tier] G[Vector Namespace] H[Cost Counter] end A -->|JWT with tenant_id| B B -->|lookup tier + policy| C C --> D D -->|allowed| F D -->|scoped query| G D -->|emit event| H E -->|provisions| G E -->|provisions| H H -->|chargeback| B style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#fef9c3,stroke:#eab308 style D fill:#f3e8ff,stroke:#a855f7 style E fill:#f0fdf4,stroke:#22c55e style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308 style H fill:#fef9c3,stroke:#eab308

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Tenant Registry	Service	Map tenant ID to isolation tier, policy profile, resource config	PostgreSQL, DynamoDB, Redis hash	Critical
Tenant Context Injector	Middleware	Extract tenant ID from JWT; inject as context header	Custom gateway middleware	Critical
Namespace Injector (Vector Store)	Service	Force tenant namespace scope on all vector store queries	Custom query interceptor layer	Critical
Per-Tenant Rate Limiter	Service	Maintain separate rate limit quota per tenant	Redis with tenant-namespaced keys	Critical
Per-Tenant Cost Counter	Service	Maintain token consumption and budget per tenant	Redis sorted sets, custom	Critical
Per-Tenant Policy Profile	Configuration	Tenant-specific OPA policy overrides	OPA policy bundles with tenant namespace	High
Tenant Onboarding Workflow	Service	Automated provisioning of all tenant isolation resources	Terraform + custom provisioner, Temporal workflow	High
Vector Store (Namespace-Aware)	Service	Partition documents by tenant namespace	Qdrant collections, Weaviate multi-tenancy, pgvector + RLS	Critical
Billing Integration	Service	Reconcile per-tenant cost events with billing system	Custom + Stripe/Zuora integration	High (SaaS)
Tenant Admin Portal	Service	Self-service configuration for tenant administrators	Custom, Backstage tenant plugin	Medium

7. Data Flow

Primary Flow — Tenant-Scoped RAG Request

Step	Actor	Action	Output
1	Tenant B Application	POST /v1/chat/completions with JWT containing `tenant_id: tenant-b`	Request at gateway
2	Gateway Authentication	Validate JWT; extract `tenant_id: tenant-b`	Tenant context established
3	Tenant Registry Lookup	Retrieve Tenant B profile: tier=dedicated-namespace, vector-ns=tenant-b, rate-limit=50K/min, policy-profile=tenant-b-eu	Tenant configuration loaded
4	OPA Policy Evaluation	Evaluate composite policy: global + tenant-b-eu overrides; request classification INTERNAL → approved for mid-tier model	Policy allow
5	Rate Limit Check	Check Tenant B's rate limit counter (namespace: `ratelimit:tenant-b`); 40K of 50K used → allow	Request proceeds
6	Vector Store Query	Application requests RAG retrieval; namespace injector forces `namespace=tenant-b` on query	Only Tenant B documents retrieved; Tenant A and C documents inaccessible
7	Model Inference	Request forwarded to Tenant B's dedicated namespace endpoint; response generated	Model response
8	Cost Attribution	Emit cost event with `tenant_id=tenant-b`; update `counter:tenant-b` in Redis	Tenant B cost updated
9	Response	Return response with `X-Tenant-ID: tenant-b` response header	Tenant B application receives response

Error Flow

Error	Detection	Response
Tenant ID not in registry	Registry lookup miss	403 with unknown-tenant code; onboarding required
Tenant namespace not initialised in vector store	Namespace injector error	503 with tenant-setup-incomplete code; alert platform team
Tenant budget exhausted	Budget counter check	429 with tenant-budget-exhausted code; tenant admin notified
Cross-tenant namespace injection attempt (application tries to override namespace)	Namespace injector override detection	403; security alert raised

8. Security Considerations

Tenant Isolation Enforcement

Tenant ID in JWT must be a non-forgeable claim signed by the trusted IdP; tenant ID must never be accepted from request body or URL parameters
Namespace injection into vector store queries must occur at the platform layer, not application layer; applications must not be trusted to correctly scope their own queries
Zero-trust principle: every request is re-evaluated for tenant context regardless of previous requests in the same session

Data Separation

Vector store namespaces use hard partitioning (separate collections in Qdrant, separate schemas in pgvector with RLS) rather than soft filtering where possible
Cache entries are always prefixed with tenant namespace ({tenant_id}:{cache_key}); Redis ACLs enforce per-namespace access
Audit logs include tenant ID on every record; security monitoring tracks cross-tenant anomalies

OWASP LLM Top 10

OWASP LLM Risk	Multi-Tenant Control
LLM06 Sensitive Information Disclosure	Vector store namespace isolation prevents cross-tenant RAG retrieval; cache scoping prevents cross-tenant cache hits
LLM04 Model DoS	Per-tenant rate limits prevent one tenant exhausting platform capacity
LLM08 Excessive Agency	Per-tenant policy profiles control what actions AI can take on behalf of each tenant

9. Governance Considerations

Tenant Data Governance

Each tenant's data in the vector store is subject to the tenant's own data governance policies; the platform provides isolation infrastructure but does not govern tenant data content
Platform retains logs of all access to tenant namespaces; these logs are available to tenants as part of their audit rights

Onboarding Governance

All tenant onboarding must be performed via the automated workflow; manual provisioning is prohibited as it bypasses audit and consistency controls
Tenant offboarding (data deletion) must be automated and audited; all tenant namespaces and cost records purged within the agreed retention period

Governance Artefacts

Artefact	Owner	Cadence	Location
Tenant registry	Platform Team	Continuous	Database + IaC
Tenant isolation tier policy	Platform Governance Board	Annual	Platform policy document
Tenant onboarding audit log	Platform Team	Per event	Audit log
Cross-tenant isolation test results	Platform Team + Security	Quarterly	Security test repository
Tenant data residency configuration	Data Governance Team	Per tenant	Tenant registry

10. Operational Considerations

Monitoring

Signal	Source	Alert Threshold	Owner
Cross-tenant access attempt	Namespace injector security alert	Any attempt	CISO + Platform On-Call
Tenant namespace not found in vector store	RAG query error	Any 503 with tenant-setup-incomplete	Platform On-Call
Per-tenant error rate spike	Tenant-scoped gateway metrics	>5% error rate for any single tenant	Platform On-Call
Tenant onboarding workflow failure	Workflow status	Any failure	Platform Team

SLOs

SLO	Target	Window
Tenant onboarding automated completion	<15 minutes from trigger	Per event
Tenant isolation verification (no cross-tenant data)	Zero incidents	Continuous
Per-tenant gateway availability	99.9% (shared tier), 99.95% (dedicated)	Rolling 30 days

Disaster Recovery

Component	RPO	RTO	Strategy
Tenant registry	5 min	15 min	Database replication
Vector store (per tenant namespace)	1 hour	30 min	Namespace backup + restore; re-indexing if needed
Per-tenant rate limit state	5 min	5 min	Redis Sentinel; brief over-limit window acceptable

11. Cost Considerations

Cost Drivers

Driver	Description	Relative Weight
Dedicated compute per tenant (Tier 3)	Separate GPU instances per tenant	Very High — only warranted for high-value/regulated tenants
Vector store storage per tenant	Proportional to tenant document volume	Medium
Per-tenant Redis namespaces	Small memory overhead per tenant	Low
Onboarding automation infrastructure	Fixed cost; amortised across tenants	Very Low

Indicative Cost Range

Isolation Tier	Monthly Infra Cost Per Tenant	Notes
Shared Pool	$20–$100 overhead	Amortised over all tenants
Dedicated Namespace	$200–$800	Dedicated queue + vector namespace + rate limit
Dedicated Compute	$3,000–$15,000+	Separate GPU inference infrastructure

12. Trade-Off Analysis

Isolation Tier Selection

Tier	Isolation Strength	Cost	Compliance Suitability	Best For
Shared Pool	Low (logical isolation only)	Lowest	Internal BUs with trust relationship	Internal enterprise teams; low-risk use cases
Dedicated Namespace	Medium (software isolation + dedicated resources)	Medium	Most external customers; standard regulatory requirements	B2B SaaS; typical financial services customers
Dedicated Compute	High (infrastructure isolation)	High	High-risk regulated; customers requiring data processing agreement with compute isolation	Healthcare providers; high-security government; regulated financial institutions

Vector Store Isolation Options

Option	Description	Isolation Strength	Pros	Cons
Separate Collections (Qdrant)	Each tenant has own collection	Strongest	Physical separation; no filter bypass	Higher per-tenant overhead
Metadata Filtering (single collection)	Shared collection with tenant_id filter injected	Medium	Lower overhead; easier management	Filter injection must be airtight; harder to audit
Row-Level Security (pgvector)	Database RLS on shared table	Strong	Database-enforced; audit trail	PostgreSQL expertise required; query performance at scale

Architectural Tensions

Tension	Option A	Option B	Resolution
Isolation strength vs. cost efficiency	Maximum isolation for all tenants	Minimum isolation	Tiered model; customers choose/pay for isolation strength
Onboarding speed vs. isolation configuration review	Fully automated self-service	Manual review each tenant	Automated for standard tier; manual review only for dedicated compute
Per-tenant customisation vs. platform consistency	Full per-tenant configuration	Standardised for all	Parameterised standard templates; tenant overrides within guardrails

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Namespace injection bug (cross-tenant data access)	Very Low	Critical	Security testing; anomaly detection	Immediate incident; audit all affected queries; mandatory pen-test
Tenant registry unavailable	Low	High — all tenant context lookups fail	Health check	Serve cached tenant context; alert platform
Dedicated compute instance failure	Medium	High — that tenant's AI features unavailable	Health check; circuit breaker	Fail over to shared pool with tenant consent; spin up replacement
Onboarding workflow partial failure	Medium	Medium — tenant incompletely provisioned	Workflow status monitoring	Re-run idempotent workflow; alert platform team
Vector namespace quota exhaustion	Low	Medium — tenant cannot index new documents	Storage metrics	Alert tenant admin; increase quota

14. Regulatory Considerations

Privacy Act / GDPR

Tenant namespaces in vector stores constitute separate data processing environments; the platform operator acts as a data processor and must ensure isolation of controllers' data
Tenant offboarding must include complete data deletion from all namespaces; automated deletion with audit log satisfies data erasure obligations

APRA CPS 234

Multi-tenant isolation controls are information security capabilities that must be maintained per CPS 234 paragraph 36
Quarterly cross-tenant penetration tests are required to verify isolation controls as CPS 234 operational resilience evidence

EU AI Act Article 9

Per-tenant risk management configurations allow different risk profiles per customer, satisfying Article 9's requirement for context-specific risk management systems

15. Reference Implementations

AWS

Component	AWS Service
Tenant registry	Amazon DynamoDB
Vector store (namespace-aware)	Amazon OpenSearch (separate indices per tenant) or pgvector with RLS on RDS
Per-tenant rate limits	ElastiCache Redis with tenant-namespaced keys
Tenant onboarding automation	AWS Step Functions + CDK
Compute isolation (dedicated)	Separate SageMaker endpoints per tenant

Azure

Component	Azure Service
Tenant registry	Azure Cosmos DB
Vector store	Azure AI Search (separate indexes per tenant)
Compute isolation	Separate Azure OpenAI deployments per tenant
Onboarding automation	Azure Logic Apps + ARM/Bicep

On-Premises

Component	Technology
Tenant registry	PostgreSQL
Vector store	Qdrant multi-tenant collections
Onboarding automation	Terraform + custom Python provisioner
Rate limits	Redis Enterprise with ACL per tenant

Pattern ID	Name	Relationship
EAAPL-PLT001	Enterprise AI Platform	Parent — multi-tenancy is a specialisation of the platform
EAAPL-PLT002	AI API Gateway	Host — tenant context injected and enforced at gateway
EAAPL-PLT004	LLM Cost Control	Extension — cost control extended per-tenant
EAAPL-RAG001	RAG Architecture	Dependency — vector store isolation critical for RAG multi-tenancy

17. Maturity Assessment

Overall Maturity: Proven Multi-tenant AI platforms are in production at SaaS companies and enterprises. Qdrant native multi-tenancy, pgvector RLS, and Redis namespace isolation are all production-ready. Automated onboarding is the variable factor.

Scoring Matrix

Dimension	Score (1–5)	Rationale
Pattern Completeness	5	All sections documented
Implementation Evidence	4	Widely deployed; vector store isolation approach varies
Security Rigor	5	Isolation controls comprehensive; penetration test guidance included
Tooling Maturity	4	Qdrant/pgvector multi-tenancy mature; onboarding automation custom per deployment
Regulatory Alignment	5	Privacy Act, GDPR, APRA CPS 234, EU AI Act all addressed

18. Revision History

Version	Date	Author	Changes
1.0	2024-07-01	EAAPL Working Group	Initial publication
1.1	2025-06-12	EAAPL Working Group	Dedicated compute tier added; pgvector RLS option documented; onboarding automation expanded

← Back to Library More Platform Engineering →