EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryData ArchitectureEAAPL-DAT001
EAAPL-DAT001Proven
⇄ Compare

AI Data Mesh Integration

🗄️ Data ArchitectureEU AI ActISO/IEC 42001

[EAAPL-DAT001] AI Data Mesh Integration

Category: Data Architecture
Sub-category: Data Mesh / AI Integration
Version: 1.2
Maturity: Proven
Tags: data-mesh, data-product, federated-governance, AI-consumption, domain-ownership, data-contract
Regulatory Relevance: APRA CPS 234, EU AI Act Articles 10 & 17, ISO 42001 §6.1, NIST AI RMF GOVERN-1.2


1. Executive Summary

Enterprises adopting Data Mesh architectures face a structural challenge when integrating AI: the mesh's domain-oriented decentralisation conflicts with AI's need for curated, governed, cross-domain data. This pattern resolves that tension by defining AI as a first-class data product consumer and, where appropriate, as a data product publisher within the mesh.

Domain teams retain ownership of the training and inference data they produce. A Federated AI Governance layer aligns mesh-level data contracts with AI-specific quality, lineage, and bias requirements. AI models that produce scored or enriched outputs register those outputs as new data products, closing the feedback loop.

Organisations adopting this pattern report a 40–60% reduction in data preparation lead time for new AI use cases, improved cross-domain data reuse, and demonstrably cleaner audit trails for regulatory enquiries—critical for APRA-regulated entities and EU AI Act Article 10 compliance on training data provenance.

Target audience: Chief Data Officers, Enterprise Architects, AI Platform leads.
Decision trigger: When an organisation runs ≥3 AI systems consuming data from ≥2 distinct business domains.


2. Problem Statement

Business Problem

AI programmes in large enterprises frequently stall at data sourcing. Central data teams become bottlenecks, domain knowledge erodes in translation, and model quality degrades because the teams closest to the data have no formal accountability for its AI-readiness.

Technical Problem

Data Mesh decentralises data ownership to domain teams who publish data products. However:

  • AI training pipelines typically require cross-domain joins that no single domain owns.
  • AI-specific quality dimensions (label quality, representativeness, temporal consistency) are absent from standard data product contracts.
  • There is no established pattern for AI model outputs to be re-published as mesh data products, creating shadow data stores.
  • Federated governance policies designed for BI/analytics do not address AI-specific risks (bias, drift, model leakage).

Symptoms

  • ML engineers spending >50% of sprint time on data acquisition and cleaning.
  • Multiple AI teams independently copying the same source data into isolated feature stores.
  • Absence of data lineage from raw source to model prediction.
  • Inability to answer "which training data produced this prediction?" during regulatory review.
  • Domain teams unaware that their data is used for AI training, leading to schema changes that silently break models.

Cost of Inaction

Dimension Impact
Time-to-value New AI use case takes 6–18 months instead of 6–12 weeks due to data friction
Regulatory APRA CPS 234 / EU AI Act Art. 10 audit failures; potential enforcement action
Data quality Silent schema drift degrades model accuracy; no detection mechanism
Duplication 3–7 shadow copies of core datasets across AI teams; storage + governance overhead
Trust Business stakeholders lose confidence when AI predictions cannot be explained from data

3. Context

When to Apply

  • The organisation has adopted or is adopting a Data Mesh architecture (≥2 domains publishing data products).
  • ≥2 AI use cases require data from multiple domains.
  • Regulatory requirements mandate training data provenance (EU AI Act Art. 10, APRA CPS 234).
  • The organisation wants to avoid a central AI data team becoming a bottleneck.
  • There is an existing data product catalogue with defined contracts.

When NOT to Apply

  • Organisation has a single, centralised data platform (use standard data lake/lakehouse pattern instead).
  • AI workloads are purely experimental/PoC with no regulatory obligations.
  • Domain teams lack data engineering maturity to maintain data contracts.
  • Data volumes are low enough that a single team can own all AI data.

Prerequisites

Prerequisite Minimum Viable Preferred
Data product catalogue Informal product list Governed catalogue with SLA/SLO metadata
Data contracts Informal schema agreements OpenAPI/Avro schema + quality SLO contract
Domain data teams 1 data engineer per domain Dedicated data product owner + engineer
Compute mesh Cloud object store per domain Isolated storage accounts + query federation
Governance tooling Spreadsheet-based policy Automated policy enforcement (OPA/Dataplex)

Industry Applicability

Industry Applicability Primary Driver
Financial Services High APRA/APRA CPS 234 + EU AI Act compliance; multi-domain risk data
Healthcare High Privacy Act + clinical data domain separation; model audit requirements
Retail / CPG High Customer, supply-chain, and product domains; personalisation AI
Telecommunications Medium Network, customer, and billing domains; churn/fraud AI
Government Medium Departmental data sovereignty + Privacy Act APP compliance
Manufacturing Medium OT/IT domain separation; predictive maintenance AI

4. Architecture Overview

Design Philosophy

The AI Data Mesh Integration pattern rests on four architectural principles that extend the core Data Mesh principles with AI-specific concerns.

Principle 1 — AI as a Domain Consumer, Not a Privileged Consumer. Traditional AI platforms often receive "super-consumer" access, bypassing domain governance. This pattern rejects that approach. Every AI training pipeline, feature store, and inference service consumes data products through formally declared contracts. The AI platform team is a consumer domain, not a platform exception.

Principle 2 — Domain Ownership Extends to AI Readiness. A domain publishing a data product for AI consumption must attest to AI-specific quality dimensions: completeness (no systematic missingness in features used by models), representativeness (data distribution reflects real-world population), freshness (SLA on data currency for real-time inference), and label quality (where the domain produces ground-truth labels). This is codified in an extended data contract schema.

Principle 3 — AI Outputs Are First-Class Data Products. Model predictions, risk scores, embeddings, and recommendations produced by AI systems must be registered in the mesh catalogue as data products with their own contracts, lineage, and SLOs. This prevents shadow data: business users consuming AI outputs through undeclared channels with no governance.

Principle 4 — Federated AI Governance Aligns to Mesh Domains. The mesh's federated computational governance is extended with AI-specific policy: bias thresholds by domain, data minimisation requirements, consent scope enforcement, and model training data approval workflows. These policies are implemented as automated checks in the data product pipeline, not as manual review gates.

Structural Components

The architecture adds three structural layers to a standard Data Mesh:

AI Data Product Contract Extension. Each domain's data product contract gains an ai_metadata block specifying: approved AI use cases, known bias risks, freshness SLA for inference, label quality score (if applicable), and consent scope. This is enforced by the mesh's governance plane during contract registration.

Cross-Domain Feature Composition Layer. Training and inference often require joins across domains (e.g., customer profile + transaction history + product catalogue). Rather than allowing AI pipelines to perform ad hoc cross-domain joins, the pattern introduces a dedicated Feature Composition Service. This service is itself governed: it declares which domain products it joins, the join logic, and registers the composed feature set as a new (composite) data product.

AI Output Publication Pipeline. Model outputs are streamed through a Publication Pipeline that enriches them with lineage metadata (which model version, which training data version, which inference timestamp), validates output schema against a registered contract, and writes to the domain-owned output store. The output product is then available for downstream consumption under the same governance as any other data product.

Federated AI Governance Plane. An extension of the mesh's existing governance plane, this adds: AI use case registry (which models may consume which products), bias monitoring integration per domain, and training data approval workflow. Policy is expressed as code (OPA Rego policies) evaluated at data product registration and pipeline execution time.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Domains["Data Domains"] A[Customer Data Product] B[Transaction Data Product] C[Product Data Product] end subgraph Governance["Federated AI Governance"] D[AI Use Case Registry] E[Bias and Consent Policy] end subgraph AI["AI Consumer Domain"] F[Feature Composition Service] G[Training and Inference] H[Output Publication Pipeline] end A -->|contract-validated| F B -->|contract-validated| F C -->|contract-validated| F D -->|use case gate| F E -->|bias and consent gate| G F --> G G --> H style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#dbeafe,stroke:#3b82f6 style D fill:#f3e8ff,stroke:#a855f7 style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#d1fae5,stroke:#10b981

6. Components

Component Type Responsibility Technology Options Criticality
Domain Data Product Data Asset Owned by domain team; publishes structured data under formal contract Delta Lake, Iceberg, BigQuery tables, Redshift Critical
AI Metadata Extension Schema Artefact Extends data contract with AI-specific quality, consent, bias metadata JSON Schema extension, Protobuf extension, dbt meta block Critical
Feature Composition Service Processing Service Cross-domain feature joining with lineage capture; registers composite feature as product Apache Spark, dbt, Databricks, Google Dataflow High
Feature Store Serving Infrastructure Stores offline (training) and online (inference) features with point-in-time correctness Feast, Tecton, Vertex AI Feature Store, SageMaker Feature Store Critical
Training Pipeline Orchestration Executes model training; consumes approved feature sets from feature store MLflow Projects, Kubeflow, Vertex AI Pipelines, SageMaker Pipelines High
Inference Service Serving Serves model predictions at runtime; reads online features from feature store TorchServe, Triton, Vertex AI Endpoints, SageMaker Endpoints Critical
Output Publication Pipeline Processing Service Enriches AI outputs with lineage; validates against output product contract; writes to catalogue Kafka Streams, AWS EventBridge + Lambda, Dataflow High
AI Output Data Product Data Asset Published by AI domain; governed like any other data product Delta Lake, Iceberg, S3 + Glue Catalogue High
Federated AI Governance Plane Governance Service Enforces AI use case policy, bias thresholds, consent scope, training data approval OPA + Rego policies, Dataplex, Collibra AI Governance, Atlan Critical
Data Product Catalogue Discovery & Lineage Unified catalogue with lineage from source to prediction; SLO tracking DataHub, OpenMetadata, Atlan, Google Dataplex Catalog High

7. Data Flow

Primary Flow

Step Actor Action Output
1 Domain Team A/B/C Registers data product with AI metadata extension in catalogue Catalogue entry with AI contract
2 Federated Governance Plane Validates AI metadata; checks bias risk; confirms consent scope Policy approval or rejection
3 AI Platform Team Declares AI use case; requests access to specific data products Use case registration + access grant
4 Feature Composition Service Reads approved data products; executes cross-domain join; captures lineage Composite feature dataset with lineage ID
5 Feature Store Ingests composite features; creates offline snapshot + online serving index Feature set available for training and inference
6 Training Pipeline Reads offline features from feature store; trains model; logs training data version Trained model artefact + training data lineage
7 Inference Service Reads online features; executes model inference Raw prediction + confidence score
8 Output Publication Pipeline Enriches prediction with lineage metadata; validates output schema Governed AI output data product
9 Downstream Consumer Reads AI output data product through catalogue contract Scored/enriched data for business process
10 Governance Plane Continuously monitors output data product for schema drift and SLO compliance Drift alerts; SLO dashboard

Error Flow

Error Condition Trigger Response Recovery
Data contract violation Schema change in upstream domain product Feature Composition Service rejects read; alert to domain owner Domain owner issues new contract version; AI team updates feature spec
Bias threshold breach Governance plane detects demographic skew in feature distribution Training pipeline blocked; bias report generated Domain team investigates data source; bias mitigation applied
Consent scope violation Feature join attempts to use data for undeclared use case Governance plane rejects join; audit log entry AI team registers new use case; consent review conducted
Feature freshness SLA breach Online feature older than contract SLA Inference service falls back to degraded mode; alert raised Feature pipeline replayed; SLA root cause investigated
Output schema drift Model output deviates from registered output contract Output publication pipeline halts; alert to model owner Model owner updates output contract; downstream consumers notified

8. Security Considerations

Authentication & Authorisation

  • All data product reads authenticated via service identity (OAuth 2.0 client credentials or Workload Identity).
  • Authorisation enforced by governance plane: only registered AI use cases may access approved data products.
  • Fine-grained column-level access control enforced at the data product serving layer.

Secrets Management

  • Data product access credentials stored in a secrets manager (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault).
  • Credentials rotated every 90 days; never embedded in pipeline code or configuration files.

Data Classification

  • All data products classified at ingestion (Public / Internal / Confidential / Restricted).
  • AI metadata extension inherits source product classification; composite features adopt highest classification of any source.
  • Training datasets classified and stored in appropriately secured storage tiers.

Encryption

  • Data at rest encrypted using AES-256; encryption keys managed by domain-owned KMS keys.
  • Data in transit encrypted using TLS 1.3 minimum.
  • Feature store online serving encrypted at rest and in transit.

Auditability

  • All data product access events logged to immutable audit log.
  • Feature composition lineage captured in OpenLineage format; stored separately from pipeline code.
  • Governance plane decisions (approve/reject/block) logged with policy version and decision rationale.

OWASP LLM Top 10 Mapping

OWASP LLM Risk Relevance to This Pattern Mitigation
LLM06: Sensitive Information Disclosure Training data containing PII/sensitive attributes may surface in model outputs Data minimisation at feature composition; output scanning for PII before publication
LLM02: Insecure Output Handling AI output data products consumed downstream without validation Output schema contract validation in publication pipeline; downstream consumer contract enforcement
LLM04: Model Denial of Service Malformed data product inputs could cause inference service overload Input schema validation at feature store boundary; rate limiting on inference service
LLM09: Overreliance Downstream consumers treating AI output products as ground truth Output product metadata must declare confidence ranges; consumer contracts include disclaimer metadata
LLM10: Model Theft Training datasets registered in catalogue may expose valuable IP if catalogue is compromised Catalogue access control; training dataset storage separate from catalogue metadata; data product physical location not exposed in catalogue

9. Governance Considerations

Responsible AI

  • Domain teams must complete an AI Impact Assessment before publishing a data product approved for AI training use.
  • Bias assessment is mandatory for data products used in consequential AI decisions (credit, health, employment).

Model Risk Management

  • Training data version must be linked to model version in model registry (bi-directional traceability).
  • Data product deprecation requires impact analysis across all registered AI consumers before execution.

Human Approval Checkpoints

  • New AI use case registration requires approval from domain data product owner + AI governance committee.
  • Training data approval workflow mandated for high-risk AI use cases (EU AI Act Annex III).
  • Bias threshold exceptions require CDO + risk committee sign-off.

Policy Enforcement

  • Governance policies expressed as code (OPA Rego); version controlled alongside data product contracts.
  • Policy violations are hard blocks (not warnings) for high-risk AI use cases.

Governance Artefacts

Artefact Owner Cadence Purpose
AI Use Case Registry AI Governance Committee On change Authoritative list of approved AI use cases + data product access grants
Data Product AI Contract Domain Data Product Owner On change Declares AI-specific quality, consent, bias metadata per product
Bias Assessment Report Domain Team + AI Platform Per training run Documents bias metrics; attests compliance with thresholds
Training Data Approval Record CDO / AI Governance Per new training dataset Formal approval for use of data product in model training
Lineage Graph Automated (OpenLineage) Continuous Source-to-prediction lineage; used for regulatory enquiries

10. Operational Considerations

Monitoring

Metric Owner Alert Threshold Tooling
Data product freshness (age of latest partition) Domain team >contract SLA DataHub SLO monitor
Feature composition job success rate AI Platform <99.5% over 1hr Airflow / Vertex AI pipeline alerts
Feature store online latency (p99) AI Platform >50ms Prometheus + Grafana
Governance policy violation rate Governance team Any violation OPA audit log + PagerDuty
AI output data product SLO AI Platform As per contract DataHub SLO monitor

SLOs

SLO Target Measurement
Data product availability 99.9% Catalogue availability check
Feature composition pipeline completion 99.5% success rate Pipeline execution logs
Online feature serving latency (p99) <50ms Feature store metrics
Governance decision latency <5 seconds Governance plane logs
AI output product publication latency <2 minutes from inference Output pipeline metrics

Logging

  • Structured JSON logging at all pipeline stages; includes data product ID, version, lineage ID, and execution timestamp.
  • Audit logs for governance decisions retained for 7 years (APRA CPS 234 requirement).

Incident Management

  • Data product SLO breach → PagerDuty alert to domain on-call; AI Platform notified.
  • Governance policy violation → immediate block + P1 incident; AI Governance committee notified within 1 hour.

Disaster Recovery

Component RTO RPO Strategy
Feature Store (offline) 4 hours 24 hours Cross-region replication of storage; pipeline replay
Feature Store (online) 15 minutes 1 hour Active-passive replica in secondary region
Governance Plane 1 hour 0 Multi-AZ deployment; policy cache in pipeline services
Data Product Catalogue 4 hours 24 hours Metadata database backup + restore

Capacity Planning

  • Feature composition jobs scale horizontally; size Spark/Dataflow clusters based on peak training data volume (typically 3–5× average).
  • Online feature store capacity sized for peak inference QPS × feature vector size × replication factor.

11. Cost Considerations

Cost Drivers

Cost Driver Typical Range Notes
Feature Store (online) $800–$8,000/month Scales with feature count × QPS; Redis cluster or managed service
Feature Store (offline) $200–$2,000/month Object storage + query cost; scales with data volume
Feature composition compute $500–$5,000/month Spark/Dataflow; batch or streaming; scales with data volume
Governance plane $200–$1,500/month OPA compute + Collibra/DataHub licence
Data product catalogue $0–$3,000/month Open source (DataHub free) vs enterprise (Atlan/Collibra SaaS)
Lineage storage $50–$500/month OpenLineage events in object store + query engine

Scaling Risks

  • Online feature store is the primary cost scaling risk: feature count × model count × QPS grows non-linearly.
  • Cross-domain feature composition at high volume can generate significant Spark compute cost.
  • Governance plane policy evaluation at high throughput may require caching to avoid per-request compute.

Optimisations

  • Cache frequently accessed feature vectors in the inference service (TTL aligned to freshness SLA).
  • Materialise composite features as domain data products to avoid repeated cross-domain joins.
  • Use serverless query engines (Athena, BigQuery) for offline feature access to avoid always-on compute.
  • Tier feature freshness SLAs: only real-time inference features need online serving; batch inference uses offline features.

Indicative Cost Range

Scale Monthly Cost Range Basis
Small (1–3 AI use cases, <1M features/day) $2,000–$8,000 Managed feature store + DataHub OSS + light governance
Medium (5–15 use cases, 10M features/day) $10,000–$35,000 Scalable feature store + enterprise catalogue + governance automation
Large (20+ use cases, 100M+ features/day) $40,000–$150,000 Multi-region feature store + full enterprise stack

12. Trade-Off Analysis

Option Comparison

Option Description Pros Cons Recommended When
A: AI Data Mesh Integration (this pattern) Full mesh integration; domain ownership; federated AI governance Full data lineage; domain expertise; regulatory-grade governance; scales to many use cases High setup cost; requires domain team maturity; governance overhead ≥3 AI use cases; regulated industry; Data Mesh already adopted
B: Centralised AI Data Platform Single central team owns all AI data; lakes/lakehouses fed into central feature store Simple governance; fast for first use case; single team to coordinate Bottleneck at scale; domain knowledge lost; fails mesh principles; hard to scale Single AI use case; no Data Mesh; small organisation
C: Federated Feature Stores per Domain Each domain runs its own feature store; no cross-domain composition layer Maximum domain autonomy; no shared bottleneck Feature duplication across stores; cross-domain AI very hard; governance nightmare Domains have very different data shapes and no cross-domain AI needed

Architectural Tensions

Tension Trade-Off Resolution in This Pattern
Domain autonomy vs. AI consistency Domains want freedom; AI needs consistent feature contracts Standardised AI metadata extension enforced at contract registration; domains free to implement internally
Governance rigor vs. iteration speed Approval gates slow ML experiments Tiered governance: experiments use sandbox products; production models require full approval workflow
Real-time freshness vs. cost Online features are expensive; not all features need real-time Freshness SLA per feature declared in contract; only SLA-requiring features use online serving
Lineage completeness vs. pipeline performance Full OpenLineage capture adds ~5–10ms per hop Async lineage emission; lineage stored separately from serving path

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Domain team publishes schema-breaking change without versioning Medium High — silently breaks dependent feature pipelines Feature composition job failure; schema validation alert Automated contract validation in CI/CD; domain team reverts or issues v2
Bias threshold misconfigured — too lenient Low Critical — biased model reaches production Post-deployment bias monitoring; external audit Emergency model rollback; bias threshold review; remediation training run
Governance plane outage Low High — all AI pipelines blocked or ungoverned Health check failure; pipeline timeout alerts Governance plane HA (multi-AZ); graceful degradation to cached policy decisions for approved use cases
Feature store corruption Very Low Critical — wrong features served to inference Feature value monitoring; anomaly detection on feature distributions Point-in-time restore from backup; inference service fallback to safe default
AI output product consumed without lineage Medium Medium — regulatory audit trail incomplete Catalogue SLO check on lineage completeness Output publication pipeline enforces lineage before write; block non-lineage writes
Cross-domain join produces unexpected data combination Medium High — privacy violation; unintended PII combination Governance plane consent scope check at join time Join blocked; AI team required to file updated consent scope with domain owners

Cascading Failure Scenarios

  • Schema cascade: Domain A publishes breaking schema change → Feature Composition Service fails → Training pipeline starves → Model staleness → Inference quality degrades → Downstream business process errors. Mitigation: contract versioning + consumer notification before deprecation.
  • Governance plane cascade: Governance plane outage during high-volume training period → Pipelines run without policy enforcement → Biased training data used → Biased model deployed → Regulatory incident. Mitigation: governance plane must be HA; pipelines fail-safe (block) on governance plane unavailability.

14. Regulatory Considerations

Regulation Requirement Pattern Response
APRA CPS 234 Data integrity controls; information security incident notification Immutable audit log for all data product access; incident escalation for governance violations
APRA CPS 230 Operational resilience; third-party risk DR targets (RTO/RPO) for feature store and governance plane; third-party data product SLA management
Privacy Act (Australia) APP 3/6 Data collection limitation; use limitation Consent scope enforcement in governance plane; data minimisation at feature composition
EU AI Act Article 10 Training data governance for high-risk AI Training data approval workflow; bias assessment; lineage from data to model
EU AI Act Article 17 Quality management system documentation AI use case registry; data product AI contracts; training data approval records
ISO 42001 §6.1 AI risk assessment Domain-level bias assessment integrated into data product registration
NIST AI RMF GOVERN-1 Policies and accountability for AI risk Federated governance policies as code; domain accountability for AI readiness
NIST AI RMF MAP-2 Scientific validity of AI data Representativeness metadata in AI contract extension; bias assessment

15. Reference Implementations

AWS

Component AWS Service
Data product storage S3 + Lake Formation for access control
Data product catalogue AWS Glue Data Catalog + DataHub on EC2
Feature composition AWS Glue / EMR Spark
Feature store Amazon SageMaker Feature Store (online + offline)
Governance plane AWS Lake Formation permissions + OPA on ECS
Training pipeline SageMaker Pipelines
Inference service SageMaker Endpoints
Lineage Amazon SageMaker ML Lineage + OpenLineage on S3

Azure

Component Azure Service
Data product storage ADLS Gen2 + Purview access policies
Data product catalogue Microsoft Purview
Feature composition Azure Databricks (Spark)
Feature store Azure ML Feature Store
Governance plane Azure Purview policy + OPA on AKS
Training pipeline Azure ML Pipelines
Inference service Azure ML Managed Endpoints
Lineage Azure Purview lineage + OpenLineage

GCP

Component GCP Service
Data product storage GCS + BigQuery + Dataplex zones
Data product catalogue Google Dataplex Catalog
Feature composition Cloud Dataflow / Dataproc
Feature store Vertex AI Feature Store
Governance plane Dataplex policy tags + OPA on Cloud Run
Training pipeline Vertex AI Pipelines
Inference service Vertex AI Endpoints
Lineage Vertex AI ML Metadata + OpenLineage

On-Premises

Component Technology
Data product storage MinIO + Apache Iceberg
Data product catalogue OpenMetadata or DataHub
Feature composition Apache Spark on Kubernetes
Feature store Feast (on Kubernetes) + Redis Enterprise
Governance plane OPA + Rego policies on Kubernetes
Training pipeline Kubeflow Pipelines
Inference service KServe on Kubernetes
Lineage OpenLineage + Marquez

Pattern ID Relationship Notes
Data Quality for AI EAAPL-DAT002 Depends on Quality gates enforced within data product contracts
Data Lineage for AI EAAPL-DAT003 Depends on Lineage capture is core to this pattern's governance
Real-Time Feature Engineering EAAPL-DAT008 Complements Online feature serving is a sub-component
AI Training Data Governance EAAPL-DAT007 Overlaps Training data governance is formalised in this pattern's AI metadata extension
Privacy by Design for AI Data EAAPL-DAT005 Depends on Consent scope enforcement is a governance plane responsibility
Model Versioning EAAPL-MDL001 Complements Model registry must link to training data product version
Human Approval Gateway EAAPL-HIL001 Complements Training data approval workflow is a human approval gate

17. Maturity Assessment

Overall Maturity: Proven — Widely deployed in enterprises with mature Data Mesh and MLOps practices. Well-documented reference implementations exist on all major cloud providers.

Dimension Score (1–5) Notes
Architectural clarity 5 Well-defined boundaries; clear ownership model
Tooling maturity 4 Feature stores and catalogues mature; federated AI governance tooling still maturing
Regulatory alignment 5 Strong alignment to EU AI Act Art. 10, APRA CPS 234
Operational complexity 3 High setup complexity; ongoing domain team capability requirement
Cost efficiency 4 Good ROI at scale; high upfront investment
Security 4 Strong controls; column-level access control tooling varies by platform
Community adoption 4 Adopted by major financial services, healthcare, and retail enterprises

18. Revision History

Version Date Author Changes
1.0 2024-01-15 EAAPL Working Group Initial pattern publication
1.1 2024-06-30 EAAPL Working Group Added EU AI Act Article 10 alignment; updated GCP reference implementation for Vertex AI Feature Store GA
1.2 2025-03-01 EAAPL Working Group Added ISO 42001 alignment; expanded failure modes; updated cost ranges for 2025 pricing
← Back to LibraryMore Data Architecture