EAAPL-DAT001Proven

AI Data Mesh Integration

🗄️ Data ArchitectureEU AI ActISO/IEC 42001

[EAAPL-DAT001] AI Data Mesh Integration

Category: Data Architecture
Sub-category: Data Mesh / AI Integration
Version: 1.2
Maturity: Proven
Tags: data-mesh, data-product, federated-governance, AI-consumption, domain-ownership, data-contract
Regulatory Relevance: APRA CPS 234, EU AI Act Articles 10 & 17, ISO 42001 §6.1, NIST AI RMF GOVERN-1.2

1. Executive Summary

Enterprises adopting Data Mesh architectures face a structural challenge when integrating AI: the mesh's domain-oriented decentralisation conflicts with AI's need for curated, governed, cross-domain data. This pattern resolves that tension by defining AI as a first-class data product consumer and, where appropriate, as a data product publisher within the mesh.

Domain teams retain ownership of the training and inference data they produce. A Federated AI Governance layer aligns mesh-level data contracts with AI-specific quality, lineage, and bias requirements. AI models that produce scored or enriched outputs register those outputs as new data products, closing the feedback loop.

Organisations adopting this pattern report a 40–60% reduction in data preparation lead time for new AI use cases, improved cross-domain data reuse, and demonstrably cleaner audit trails for regulatory enquiries—critical for APRA-regulated entities and EU AI Act Article 10 compliance on training data provenance.

Target audience: Chief Data Officers, Enterprise Architects, AI Platform leads.
Decision trigger: When an organisation runs ≥3 AI systems consuming data from ≥2 distinct business domains.

2. Problem Statement

Business Problem

AI programmes in large enterprises frequently stall at data sourcing. Central data teams become bottlenecks, domain knowledge erodes in translation, and model quality degrades because the teams closest to the data have no formal accountability for its AI-readiness.

Technical Problem

Data Mesh decentralises data ownership to domain teams who publish data products. However:

AI training pipelines typically require cross-domain joins that no single domain owns.
AI-specific quality dimensions (label quality, representativeness, temporal consistency) are absent from standard data product contracts.
There is no established pattern for AI model outputs to be re-published as mesh data products, creating shadow data stores.
Federated governance policies designed for BI/analytics do not address AI-specific risks (bias, drift, model leakage).

Symptoms

ML engineers spending >50% of sprint time on data acquisition and cleaning.
Multiple AI teams independently copying the same source data into isolated feature stores.
Absence of data lineage from raw source to model prediction.
Inability to answer "which training data produced this prediction?" during regulatory review.
Domain teams unaware that their data is used for AI training, leading to schema changes that silently break models.

Cost of Inaction

Dimension	Impact
Time-to-value	New AI use case takes 6–18 months instead of 6–12 weeks due to data friction
Regulatory	APRA CPS 234 / EU AI Act Art. 10 audit failures; potential enforcement action
Data quality	Silent schema drift degrades model accuracy; no detection mechanism
Duplication	3–7 shadow copies of core datasets across AI teams; storage + governance overhead
Trust	Business stakeholders lose confidence when AI predictions cannot be explained from data

3. Context

When to Apply

The organisation has adopted or is adopting a Data Mesh architecture (≥2 domains publishing data products).
≥2 AI use cases require data from multiple domains.
Regulatory requirements mandate training data provenance (EU AI Act Art. 10, APRA CPS 234).
The organisation wants to avoid a central AI data team becoming a bottleneck.
There is an existing data product catalogue with defined contracts.

When NOT to Apply

Organisation has a single, centralised data platform (use standard data lake/lakehouse pattern instead).
AI workloads are purely experimental/PoC with no regulatory obligations.
Domain teams lack data engineering maturity to maintain data contracts.
Data volumes are low enough that a single team can own all AI data.

Prerequisites

Prerequisite	Minimum Viable	Preferred
Data product catalogue	Informal product list	Governed catalogue with SLA/SLO metadata
Data contracts	Informal schema agreements	OpenAPI/Avro schema + quality SLO contract
Domain data teams	1 data engineer per domain	Dedicated data product owner + engineer
Compute mesh	Cloud object store per domain	Isolated storage accounts + query federation
Governance tooling	Spreadsheet-based policy	Automated policy enforcement (OPA/Dataplex)

Industry Applicability

Industry	Applicability	Primary Driver
Financial Services	High	APRA/APRA CPS 234 + EU AI Act compliance; multi-domain risk data
Healthcare	High	Privacy Act + clinical data domain separation; model audit requirements
Retail / CPG	High	Customer, supply-chain, and product domains; personalisation AI
Telecommunications	Medium	Network, customer, and billing domains; churn/fraud AI
Government	Medium	Departmental data sovereignty + Privacy Act APP compliance
Manufacturing	Medium	OT/IT domain separation; predictive maintenance AI

4. Architecture Overview

Design Philosophy

The AI Data Mesh Integration pattern rests on four architectural principles that extend the core Data Mesh principles with AI-specific concerns.

Principle 1 — AI as a Domain Consumer, Not a Privileged Consumer. Traditional AI platforms often receive "super-consumer" access, bypassing domain governance. This pattern rejects that approach. Every AI training pipeline, feature store, and inference service consumes data products through formally declared contracts. The AI platform team is a consumer domain, not a platform exception.

Principle 2 — Domain Ownership Extends to AI Readiness. A domain publishing a data product for AI consumption must attest to AI-specific quality dimensions: completeness (no systematic missingness in features used by models), representativeness (data distribution reflects real-world population), freshness (SLA on data currency for real-time inference), and label quality (where the domain produces ground-truth labels). This is codified in an extended data contract schema.

Principle 3 — AI Outputs Are First-Class Data Products. Model predictions, risk scores, embeddings, and recommendations produced by AI systems must be registered in the mesh catalogue as data products with their own contracts, lineage, and SLOs. This prevents shadow data: business users consuming AI outputs through undeclared channels with no governance.

Principle 4 — Federated AI Governance Aligns to Mesh Domains. The mesh's federated computational governance is extended with AI-specific policy: bias thresholds by domain, data minimisation requirements, consent scope enforcement, and model training data approval workflows. These policies are implemented as automated checks in the data product pipeline, not as manual review gates.

Structural Components

The architecture adds three structural layers to a standard Data Mesh:

AI Data Product Contract Extension. Each domain's data product contract gains an ai_metadata block specifying: approved AI use cases, known bias risks, freshness SLA for inference, label quality score (if applicable), and consent scope. This is enforced by the mesh's governance plane during contract registration.

Cross-Domain Feature Composition Layer. Training and inference often require joins across domains (e.g., customer profile + transaction history + product catalogue). Rather than allowing AI pipelines to perform ad hoc cross-domain joins, the pattern introduces a dedicated Feature Composition Service. This service is itself governed: it declares which domain products it joins, the join logic, and registers the composed feature set as a new (composite) data product.

AI Output Publication Pipeline. Model outputs are streamed through a Publication Pipeline that enriches them with lineage metadata (which model version, which training data version, which inference timestamp), validates output schema against a registered contract, and writes to the domain-owned output store. The output product is then available for downstream consumption under the same governance as any other data product.

Federated AI Governance Plane. An extension of the mesh's existing governance plane, this adds: AI use case registry (which models may consume which products), bias monitoring integration per domain, and training data approval workflow. Policy is expressed as code (OPA Rego policies) evaluated at data product registration and pipeline execution time.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Domains["Data Domains"] A[Customer Data Product] B[Transaction Data Product] C[Product Data Product] end subgraph Governance["Federated AI Governance"] D[AI Use Case Registry] E[Bias and Consent Policy] end subgraph AI["AI Consumer Domain"] F[Feature Composition Service] G[Training and Inference] H[Output Publication Pipeline] end A -->|contract-validated| F B -->|contract-validated| F C -->|contract-validated| F D -->|use case gate| F E -->|bias and consent gate| G F --> G G --> H style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#dbeafe,stroke:#3b82f6 style D fill:#f3e8ff,stroke:#a855f7 style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#d1fae5,stroke:#10b981

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Domain Data Product	Data Asset	Owned by domain team; publishes structured data under formal contract	Delta Lake, Iceberg, BigQuery tables, Redshift	Critical
AI Metadata Extension	Schema Artefact	Extends data contract with AI-specific quality, consent, bias metadata	JSON Schema extension, Protobuf extension, dbt meta block	Critical
Feature Composition Service	Processing Service	Cross-domain feature joining with lineage capture; registers composite feature as product	Apache Spark, dbt, Databricks, Google Dataflow	High
Feature Store	Serving Infrastructure	Stores offline (training) and online (inference) features with point-in-time correctness	Feast, Tecton, Vertex AI Feature Store, SageMaker Feature Store	Critical
Training Pipeline	Orchestration	Executes model training; consumes approved feature sets from feature store	MLflow Projects, Kubeflow, Vertex AI Pipelines, SageMaker Pipelines	High
Inference Service	Serving	Serves model predictions at runtime; reads online features from feature store	TorchServe, Triton, Vertex AI Endpoints, SageMaker Endpoints	Critical
Output Publication Pipeline	Processing Service	Enriches AI outputs with lineage; validates against output product contract; writes to catalogue	Kafka Streams, AWS EventBridge + Lambda, Dataflow	High
AI Output Data Product	Data Asset	Published by AI domain; governed like any other data product	Delta Lake, Iceberg, S3 + Glue Catalogue	High
Federated AI Governance Plane	Governance Service	Enforces AI use case policy, bias thresholds, consent scope, training data approval	OPA + Rego policies, Dataplex, Collibra AI Governance, Atlan	Critical
Data Product Catalogue	Discovery & Lineage	Unified catalogue with lineage from source to prediction; SLO tracking	DataHub, OpenMetadata, Atlan, Google Dataplex Catalog	High

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	Domain Team A/B/C	Registers data product with AI metadata extension in catalogue	Catalogue entry with AI contract
2	Federated Governance Plane	Validates AI metadata; checks bias risk; confirms consent scope	Policy approval or rejection
3	AI Platform Team	Declares AI use case; requests access to specific data products	Use case registration + access grant
4	Feature Composition Service	Reads approved data products; executes cross-domain join; captures lineage	Composite feature dataset with lineage ID
5	Feature Store	Ingests composite features; creates offline snapshot + online serving index	Feature set available for training and inference
6	Training Pipeline	Reads offline features from feature store; trains model; logs training data version	Trained model artefact + training data lineage
7	Inference Service	Reads online features; executes model inference	Raw prediction + confidence score
8	Output Publication Pipeline	Enriches prediction with lineage metadata; validates output schema	Governed AI output data product
9	Downstream Consumer	Reads AI output data product through catalogue contract	Scored/enriched data for business process
10	Governance Plane	Continuously monitors output data product for schema drift and SLO compliance	Drift alerts; SLO dashboard

Error Flow

Error Condition	Trigger	Response	Recovery
Data contract violation	Schema change in upstream domain product	Feature Composition Service rejects read; alert to domain owner	Domain owner issues new contract version; AI team updates feature spec
Bias threshold breach	Governance plane detects demographic skew in feature distribution	Training pipeline blocked; bias report generated	Domain team investigates data source; bias mitigation applied
Consent scope violation	Feature join attempts to use data for undeclared use case	Governance plane rejects join; audit log entry	AI team registers new use case; consent review conducted
Feature freshness SLA breach	Online feature older than contract SLA	Inference service falls back to degraded mode; alert raised	Feature pipeline replayed; SLA root cause investigated
Output schema drift	Model output deviates from registered output contract	Output publication pipeline halts; alert to model owner	Model owner updates output contract; downstream consumers notified

8. Security Considerations

Authentication & Authorisation

All data product reads authenticated via service identity (OAuth 2.0 client credentials or Workload Identity).
Authorisation enforced by governance plane: only registered AI use cases may access approved data products.
Fine-grained column-level access control enforced at the data product serving layer.

Secrets Management

Data product access credentials stored in a secrets manager (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault).
Credentials rotated every 90 days; never embedded in pipeline code or configuration files.

Data Classification

All data products classified at ingestion (Public / Internal / Confidential / Restricted).
AI metadata extension inherits source product classification; composite features adopt highest classification of any source.
Training datasets classified and stored in appropriately secured storage tiers.

Encryption

Data at rest encrypted using AES-256; encryption keys managed by domain-owned KMS keys.
Data in transit encrypted using TLS 1.3 minimum.
Feature store online serving encrypted at rest and in transit.

Auditability

All data product access events logged to immutable audit log.
Feature composition lineage captured in OpenLineage format; stored separately from pipeline code.
Governance plane decisions (approve/reject/block) logged with policy version and decision rationale.

OWASP LLM Top 10 Mapping

OWASP LLM Risk	Relevance to This Pattern	Mitigation
LLM06: Sensitive Information Disclosure	Training data containing PII/sensitive attributes may surface in model outputs	Data minimisation at feature composition; output scanning for PII before publication
LLM02: Insecure Output Handling	AI output data products consumed downstream without validation	Output schema contract validation in publication pipeline; downstream consumer contract enforcement
LLM04: Model Denial of Service	Malformed data product inputs could cause inference service overload	Input schema validation at feature store boundary; rate limiting on inference service
LLM09: Overreliance	Downstream consumers treating AI output products as ground truth	Output product metadata must declare confidence ranges; consumer contracts include disclaimer metadata
LLM10: Model Theft	Training datasets registered in catalogue may expose valuable IP if catalogue is compromised	Catalogue access control; training dataset storage separate from catalogue metadata; data product physical location not exposed in catalogue

9. Governance Considerations

Responsible AI

Domain teams must complete an AI Impact Assessment before publishing a data product approved for AI training use.
Bias assessment is mandatory for data products used in consequential AI decisions (credit, health, employment).

Model Risk Management

Training data version must be linked to model version in model registry (bi-directional traceability).
Data product deprecation requires impact analysis across all registered AI consumers before execution.

Human Approval Checkpoints

New AI use case registration requires approval from domain data product owner + AI governance committee.
Training data approval workflow mandated for high-risk AI use cases (EU AI Act Annex III).
Bias threshold exceptions require CDO + risk committee sign-off.

Policy Enforcement

Governance policies expressed as code (OPA Rego); version controlled alongside data product contracts.
Policy violations are hard blocks (not warnings) for high-risk AI use cases.

Governance Artefacts

Artefact	Owner	Cadence	Purpose
AI Use Case Registry	AI Governance Committee	On change	Authoritative list of approved AI use cases + data product access grants
Data Product AI Contract	Domain Data Product Owner	On change	Declares AI-specific quality, consent, bias metadata per product
Bias Assessment Report	Domain Team + AI Platform	Per training run	Documents bias metrics; attests compliance with thresholds
Training Data Approval Record	CDO / AI Governance	Per new training dataset	Formal approval for use of data product in model training
Lineage Graph	Automated (OpenLineage)	Continuous	Source-to-prediction lineage; used for regulatory enquiries

10. Operational Considerations

Monitoring

Metric	Owner	Alert Threshold	Tooling
Data product freshness (age of latest partition)	Domain team	>contract SLA	DataHub SLO monitor
Feature composition job success rate	AI Platform	<99.5% over 1hr	Airflow / Vertex AI pipeline alerts
Feature store online latency (p99)	AI Platform	>50ms	Prometheus + Grafana
Governance policy violation rate	Governance team	Any violation	OPA audit log + PagerDuty
AI output data product SLO	AI Platform	As per contract	DataHub SLO monitor

SLOs

SLO	Target	Measurement
Data product availability	99.9%	Catalogue availability check
Feature composition pipeline completion	99.5% success rate	Pipeline execution logs
Online feature serving latency (p99)	<50ms	Feature store metrics
Governance decision latency	<5 seconds	Governance plane logs
AI output product publication latency	<2 minutes from inference	Output pipeline metrics

Logging

Structured JSON logging at all pipeline stages; includes data product ID, version, lineage ID, and execution timestamp.
Audit logs for governance decisions retained for 7 years (APRA CPS 234 requirement).

Incident Management

Data product SLO breach → PagerDuty alert to domain on-call; AI Platform notified.
Governance policy violation → immediate block + P1 incident; AI Governance committee notified within 1 hour.

Disaster Recovery

Component	RTO	RPO	Strategy
Feature Store (offline)	4 hours	24 hours	Cross-region replication of storage; pipeline replay
Feature Store (online)	15 minutes	1 hour	Active-passive replica in secondary region
Governance Plane	1 hour	0	Multi-AZ deployment; policy cache in pipeline services
Data Product Catalogue	4 hours	24 hours	Metadata database backup + restore

Capacity Planning

Feature composition jobs scale horizontally; size Spark/Dataflow clusters based on peak training data volume (typically 3–5× average).
Online feature store capacity sized for peak inference QPS × feature vector size × replication factor.

11. Cost Considerations

Cost Drivers

Cost Driver	Typical Range	Notes
Feature Store (online)	$800–$8,000/month	Scales with feature count × QPS; Redis cluster or managed service
Feature Store (offline)	$200–$2,000/month	Object storage + query cost; scales with data volume
Feature composition compute	$500–$5,000/month	Spark/Dataflow; batch or streaming; scales with data volume
Governance plane	$200–$1,500/month	OPA compute + Collibra/DataHub licence
Data product catalogue	$0–$3,000/month	Open source (DataHub free) vs enterprise (Atlan/Collibra SaaS)
Lineage storage	$50–$500/month	OpenLineage events in object store + query engine

Scaling Risks

Online feature store is the primary cost scaling risk: feature count × model count × QPS grows non-linearly.
Cross-domain feature composition at high volume can generate significant Spark compute cost.
Governance plane policy evaluation at high throughput may require caching to avoid per-request compute.

Optimisations

Cache frequently accessed feature vectors in the inference service (TTL aligned to freshness SLA).
Materialise composite features as domain data products to avoid repeated cross-domain joins.
Use serverless query engines (Athena, BigQuery) for offline feature access to avoid always-on compute.
Tier feature freshness SLAs: only real-time inference features need online serving; batch inference uses offline features.

Indicative Cost Range

Scale	Monthly Cost Range	Basis
Small (1–3 AI use cases, <1M features/day)	$2,000–$8,000	Managed feature store + DataHub OSS + light governance
Medium (5–15 use cases, 10M features/day)	$10,000–$35,000	Scalable feature store + enterprise catalogue + governance automation
Large (20+ use cases, 100M+ features/day)	$40,000–$150,000	Multi-region feature store + full enterprise stack

12. Trade-Off Analysis

Option Comparison

Option	Description	Pros	Cons	Recommended When
A: AI Data Mesh Integration (this pattern)	Full mesh integration; domain ownership; federated AI governance	Full data lineage; domain expertise; regulatory-grade governance; scales to many use cases	High setup cost; requires domain team maturity; governance overhead	≥3 AI use cases; regulated industry; Data Mesh already adopted
B: Centralised AI Data Platform	Single central team owns all AI data; lakes/lakehouses fed into central feature store	Simple governance; fast for first use case; single team to coordinate	Bottleneck at scale; domain knowledge lost; fails mesh principles; hard to scale	Single AI use case; no Data Mesh; small organisation
C: Federated Feature Stores per Domain	Each domain runs its own feature store; no cross-domain composition layer	Maximum domain autonomy; no shared bottleneck	Feature duplication across stores; cross-domain AI very hard; governance nightmare	Domains have very different data shapes and no cross-domain AI needed

Architectural Tensions

Tension	Trade-Off	Resolution in This Pattern
Domain autonomy vs. AI consistency	Domains want freedom; AI needs consistent feature contracts	Standardised AI metadata extension enforced at contract registration; domains free to implement internally
Governance rigor vs. iteration speed	Approval gates slow ML experiments	Tiered governance: experiments use sandbox products; production models require full approval workflow
Real-time freshness vs. cost	Online features are expensive; not all features need real-time	Freshness SLA per feature declared in contract; only SLA-requiring features use online serving
Lineage completeness vs. pipeline performance	Full OpenLineage capture adds ~5–10ms per hop	Async lineage emission; lineage stored separately from serving path

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Domain team publishes schema-breaking change without versioning	Medium	High — silently breaks dependent feature pipelines	Feature composition job failure; schema validation alert	Automated contract validation in CI/CD; domain team reverts or issues v2
Bias threshold misconfigured — too lenient	Low	Critical — biased model reaches production	Post-deployment bias monitoring; external audit	Emergency model rollback; bias threshold review; remediation training run
Governance plane outage	Low	High — all AI pipelines blocked or ungoverned	Health check failure; pipeline timeout alerts	Governance plane HA (multi-AZ); graceful degradation to cached policy decisions for approved use cases
Feature store corruption	Very Low	Critical — wrong features served to inference	Feature value monitoring; anomaly detection on feature distributions	Point-in-time restore from backup; inference service fallback to safe default
AI output product consumed without lineage	Medium	Medium — regulatory audit trail incomplete	Catalogue SLO check on lineage completeness	Output publication pipeline enforces lineage before write; block non-lineage writes
Cross-domain join produces unexpected data combination	Medium	High — privacy violation; unintended PII combination	Governance plane consent scope check at join time	Join blocked; AI team required to file updated consent scope with domain owners

Cascading Failure Scenarios

Schema cascade: Domain A publishes breaking schema change → Feature Composition Service fails → Training pipeline starves → Model staleness → Inference quality degrades → Downstream business process errors. Mitigation: contract versioning + consumer notification before deprecation.
Governance plane cascade: Governance plane outage during high-volume training period → Pipelines run without policy enforcement → Biased training data used → Biased model deployed → Regulatory incident. Mitigation: governance plane must be HA; pipelines fail-safe (block) on governance plane unavailability.

14. Regulatory Considerations

Regulation	Requirement	Pattern Response
APRA CPS 234	Data integrity controls; information security incident notification	Immutable audit log for all data product access; incident escalation for governance violations
APRA CPS 230	Operational resilience; third-party risk	DR targets (RTO/RPO) for feature store and governance plane; third-party data product SLA management
Privacy Act (Australia) APP 3/6	Data collection limitation; use limitation	Consent scope enforcement in governance plane; data minimisation at feature composition
EU AI Act Article 10	Training data governance for high-risk AI	Training data approval workflow; bias assessment; lineage from data to model
EU AI Act Article 17	Quality management system documentation	AI use case registry; data product AI contracts; training data approval records
ISO 42001 §6.1	AI risk assessment	Domain-level bias assessment integrated into data product registration
NIST AI RMF GOVERN-1	Policies and accountability for AI risk	Federated governance policies as code; domain accountability for AI readiness
NIST AI RMF MAP-2	Scientific validity of AI data	Representativeness metadata in AI contract extension; bias assessment

15. Reference Implementations

AWS

Component	AWS Service
Data product storage	S3 + Lake Formation for access control
Data product catalogue	AWS Glue Data Catalog + DataHub on EC2
Feature composition	AWS Glue / EMR Spark
Feature store	Amazon SageMaker Feature Store (online + offline)
Governance plane	AWS Lake Formation permissions + OPA on ECS
Training pipeline	SageMaker Pipelines
Inference service	SageMaker Endpoints
Lineage	Amazon SageMaker ML Lineage + OpenLineage on S3

Azure

Component	Azure Service
Data product storage	ADLS Gen2 + Purview access policies
Data product catalogue	Microsoft Purview
Feature composition	Azure Databricks (Spark)
Feature store	Azure ML Feature Store
Governance plane	Azure Purview policy + OPA on AKS
Training pipeline	Azure ML Pipelines
Inference service	Azure ML Managed Endpoints
Lineage	Azure Purview lineage + OpenLineage

GCP

Component	GCP Service
Data product storage	GCS + BigQuery + Dataplex zones
Data product catalogue	Google Dataplex Catalog
Feature composition	Cloud Dataflow / Dataproc
Feature store	Vertex AI Feature Store
Governance plane	Dataplex policy tags + OPA on Cloud Run
Training pipeline	Vertex AI Pipelines
Inference service	Vertex AI Endpoints
Lineage	Vertex AI ML Metadata + OpenLineage

On-Premises

Component	Technology
Data product storage	MinIO + Apache Iceberg
Data product catalogue	OpenMetadata or DataHub
Feature composition	Apache Spark on Kubernetes
Feature store	Feast (on Kubernetes) + Redis Enterprise
Governance plane	OPA + Rego policies on Kubernetes
Training pipeline	Kubeflow Pipelines
Inference service	KServe on Kubernetes
Lineage	OpenLineage + Marquez

Pattern	ID	Relationship	Notes
Data Quality for AI	EAAPL-DAT002	Depends on	Quality gates enforced within data product contracts
Data Lineage for AI	EAAPL-DAT003	Depends on	Lineage capture is core to this pattern's governance
Real-Time Feature Engineering	EAAPL-DAT008	Complements	Online feature serving is a sub-component
AI Training Data Governance	EAAPL-DAT007	Overlaps	Training data governance is formalised in this pattern's AI metadata extension
Privacy by Design for AI Data	EAAPL-DAT005	Depends on	Consent scope enforcement is a governance plane responsibility
Model Versioning	EAAPL-MDL001	Complements	Model registry must link to training data product version
Human Approval Gateway	EAAPL-HIL001	Complements	Training data approval workflow is a human approval gate

17. Maturity Assessment

Overall Maturity: Proven — Widely deployed in enterprises with mature Data Mesh and MLOps practices. Well-documented reference implementations exist on all major cloud providers.

Dimension	Score (1–5)	Notes
Architectural clarity	5	Well-defined boundaries; clear ownership model
Tooling maturity	4	Feature stores and catalogues mature; federated AI governance tooling still maturing
Regulatory alignment	5	Strong alignment to EU AI Act Art. 10, APRA CPS 234
Operational complexity	3	High setup complexity; ongoing domain team capability requirement
Cost efficiency	4	Good ROI at scale; high upfront investment
Security	4	Strong controls; column-level access control tooling varies by platform
Community adoption	4	Adopted by major financial services, healthcare, and retail enterprises

18. Revision History

Version	Date	Author	Changes
1.0	2024-01-15	EAAPL Working Group	Initial pattern publication
1.1	2024-06-30	EAAPL Working Group	Added EU AI Act Article 10 alignment; updated GCP reference implementation for Vertex AI Feature Store GA
1.2	2025-03-01	EAAPL Working Group	Added ISO 42001 alignment; expanded failure modes; updated cost ranges for 2025 pricing

← Back to Library More Data Architecture →