EAAPL-CMP004 — Privacy-Preserving AI Architecture
Status: Proven
Tags: privacy-act gdpr pii-handling data-isolation high-complexity
Version: 1.4
Last Updated: 2026-06-12
Author: Enterprise AI Architecture Pattern Library
1. Executive Summary
AI systems that ingest, process, or generate personal information must comply with privacy obligations across multiple jurisdictions simultaneously. This pattern addresses the dual requirements of the Australian Privacy Act 1988 (Commonwealth) Australian Privacy Principles and the EU General Data Protection Regulation. Both frameworks impose obligations on data minimisation, purpose limitation, cross-border transfer restrictions, consent management, and individual rights—each of which creates specific technical requirements when applied to AI inference pipelines, retrieval-augmented generation systems, and AI training workflows.
The Privacy-Preserving AI Architecture provides a layered control framework: a PII detection and redaction pipeline that sanitises inputs before they reach the LLM; purpose limitation enforcement through technical controls on data flows; a consent management integration that gates AI processing on valid consent; data sovereignty controls that prevent regulated personal data from crossing prohibited geographic boundaries; and a machine unlearning capability to respond to erasure requests. Organisations implementing this pattern demonstrate technical compliance with Privacy Act APP 11 (security of personal information), GDPR Articles 5, 6, 17, 20, and 25, and build AI systems that earn customer trust through verifiable privacy practices.
2. Problem Statement
Business Problem
Organisations building AI products on top of personal data face mounting privacy enforcement risk. The Australian Information Commissioner has signalled enforcement focus on AI systems in the 2025–2026 period. European Data Protection Authorities issued EUR 4.5 billion in GDPR fines through 2025, with AI-related processing accounting for a growing share. Privacy-by-design is a legal obligation under GDPR Article 25, not an optional design choice.
Technical Problem
Default LLM and AI service configurations are not privacy-preserving. Cloud LLM providers may use submitted data for model improvement. PII is routinely present in business documents, customer support tickets, and employee records that are ingested as RAG context. Training datasets frequently contain PII that was not adequately screened. The legal basis for AI inference on personal data is frequently assumed (legitimate interest) without a genuine balancing test.
Symptoms
- Customer PII (names, email addresses, account numbers) is sent directly to third-party LLM APIs without redaction
- There is no documented lawful basis under GDPR Article 6 or APP 3 for the specific AI processing activity
- Privacy Impact Assessments are not conducted before deploying AI features that process personal data
- RAG vector stores contain embeddings derived from personal data with no retention policy or deletion capability
- The organisation cannot respond to a data subject erasure request because it cannot identify all locations where AI systems have stored or derived representations of that individual's data
Cost of Inaction
| Dimension |
Consequence |
| Regulatory |
GDPR fines up to EUR 20M or 4% global turnover (Article 83(5)); OAIC determinations |
| Legal |
Class action risk; individual compensation claims under GDPR Article 82 |
| Operational |
Loss of ability to process EU personal data; suspension of AI services |
| Reputational |
Public data breach notification; customer trust erosion; media coverage |
3. Context
When to Apply
- Any AI system that receives, processes, or outputs personal information as defined by Privacy Act (information about an identified or reasonably identifiable individual) or GDPR (any information relating to an identified or identifiable natural person)
- AI systems processing sensitive information (health, financial, biometric, religious, political, sexual orientation) — highest risk tier requiring strongest controls
- RAG pipelines that index internal documents containing personal information
- AI training workflows using datasets that may contain personal information
- Cross-border AI services where data flows from EU or Australia to third-country AI providers
When NOT to Apply
- AI systems processing exclusively synthetic or fully anonymised data (k-anonymity + l-diversity + t-closeness with formal guarantee)
- Internal AI tools used by authorised staff on data they are already authorised to access, where no third-party cloud AI API is involved
- AI research on fully de-identified public datasets with no personal data re-identification risk
Prerequisites
| Prerequisite |
Description |
| Data Classification |
An operational data classification scheme that identifies personal and sensitive personal information |
| Records of Processing |
GDPR Article 30 records of processing activities; Australian APP 1 privacy policy |
| Legal Basis Assessment |
Organisation has assessed lawful bases available for the AI processing activity |
| Consent Management Platform |
Existing CMP or ability to deploy one if consent is the chosen lawful basis |
| Data Flow Mapping |
Mapping of data flows from source through AI pipeline to output and storage |
Industry Applicability
| Industry |
Risk Level |
Key Privacy Obligations |
| Healthcare |
Very High |
Health information is sensitive personal data; strict APP 6 / GDPR Article 9 obligations |
| Financial Services |
High |
Financial information; creditworthiness; APP 11 / GDPR Article 6 legitimate interest balancing |
| Human Resources |
High |
Employee records; performance data; sensitive categories common |
| Education |
High |
Student data; minor's data (enhanced GDPR protections) |
| Retail / E-commerce |
Medium |
Purchase history; profile data; consent management critical for personalisation AI |
| Legal Services |
High |
Legal professional privilege intersects with AI processing obligations |
| Government |
High |
Privacy Act 1988 applies to agencies; FOI and Privacy Act intersection |
4. Architecture Overview
The Privacy-Preserving AI Architecture implements privacy controls at each stage of the AI data lifecycle: input, inference, retrieval, output, and storage. The architecture is founded on Privacy by Design (GDPR Article 25) — controls are embedded in the system, not applied as an afterthought.
Stage 1 — Data Minimisation at Input
Before any data reaches the AI inference engine, a data minimisation filter applies the principle of processing only what is strictly necessary for the declared purpose. For structured inputs, this means stripping fields that are not required for the AI task (e.g., if an AI system summarises customer support tickets, it does not need the customer's date of birth — this field is removed). For unstructured text inputs, a PII detection and redaction pipeline identifies and pseudonymises or redacts personal identifiers. The minimisation configuration is driven by the purpose specification: each AI use case has a documented purpose, and the data minimisation filter enforces the minimum data set for that purpose.
Stage 2 — PII Detection and Redaction Pipeline
The core privacy technical control is a PII detection and redaction pipeline that processes all text inputs destined for LLM inference. The pipeline operates in two modes: redaction mode (PII replaced with a token such as [PERSON_NAME] or [EMAIL_ADDRESS]) for use cases where the AI does not need the actual values; and pseudonymisation mode (PII replaced with a reversible, keyed token) for use cases where the actual value is needed downstream but should not be sent to the LLM. Pseudonymisation uses a per-context encryption key managed in a secrets manager; the mapping between pseudonym and real value is stored in a secure lookup store that is never exposed to the AI model. The pipeline supports Australian and EU PII categories: names, addresses, dates of birth, email addresses, phone numbers, government identifiers (TFN, Medicare, ABN, passport), financial account numbers, health information, and biometric descriptors.
Stage 3 — Purpose Limitation Technical Controls
Purpose limitation (APP 6; GDPR Article 5(1)(b)) requires that data collected for one purpose is not used for a different, incompatible purpose. In AI systems, the risk is that data collected for customer service is used to train a marketing personalisation model. Technical purpose limitation controls include: purpose tags on all data assets, enforced by a policy engine that blocks AI workflows from accessing data tagged for incompatible purposes; data access audit logs that record which AI workflow consumed which data; and change management gates that require re-assessment of lawful basis and consent when an AI workflow's purpose changes.
Stage 4 — Consent Management Integration
Where consent is the lawful basis for AI processing (GDPR Article 6(1)(a); APP 3.3), the AI processing pipeline must check consent status before processing. The consent management integration queries the consent management platform (CMP) at the start of each processing job or inference request. If valid consent does not exist for the specific processing activity, the request is blocked and the user is prompted to provide consent. Consent must be granular — separate consent for different AI processing activities (personalisation vs. model training vs. analytics). Consent withdrawal must immediately prevent further processing; it does not require the same technical immediacy as an erasure request but must be honoured within a commercially reasonable timeframe (GDPR guidance: without undue delay).
Stage 5 — Cross-Border Transfer Controls
GDPR Article 46 restricts transfers of personal data to third countries without adequate protection. Australia's APP 8 imposes a similar obligation on cross-border disclosure. Cloud LLM API calls that send EU or Australian personal data to a provider operating outside those jurisdictions require a valid transfer mechanism: GDPR Standard Contractual Clauses (SCCs), adequacy decision, or Binding Corporate Rules; Australian APP 8.1 accountability mechanism. The cross-border transfer control layer checks each outbound API call against the destination's jurisdiction status and the data's classification. If the data is subject to transfer restrictions and no valid mechanism exists, the call is blocked and a local model fallback is invoked.
Stage 6 — Machine Unlearning for Erasure Requests
GDPR Article 17 (right to erasure) and APP 13 (correction and destruction) create obligations to delete personal information on request. For AI systems, this is technically complex because personal data may be encoded in model weights through training, in embedding vectors in vector stores, and in inference logs. The machine unlearning layer provides: immediate deletion from inference logs (database delete + audit record); deletion from RAG vector stores (identify all vectors derived from the data subject's documents; delete by document ID); for model weights — re-training or fine-tuning on a dataset with the individual's data removed (computationally expensive; may be deferred for large foundation models with documented justification); and an erasure request tracking system that records the status of each sub-task.
5. Architecture Diagram
flowchart TD
subgraph Input["Input Controls"]
A[Personal Data Source]
B{Consent and Purpose Gate}
C[PII Redaction Pipeline]
end
subgraph Inference["Inference Layer"]
D{Cross-Border Check}
E[Local Sovereign Model]
F[Third-Party LLM API]
end
subgraph Output["Output and Rights"]
G[Output PII Filter]
H[Data Subject Rights Portal]
end
A --> B
B -->|blocked| A
B -->|permitted| C
C --> D
D -->|transfer blocked| E
D -->|transfer permitted| F
E --> G
F --> G
G --> H
H -->|erasure or restrict| B
style A fill:#dbeafe,stroke:#3b82f6
style B fill:#f3e8ff,stroke:#a855f7
style C fill:#f0fdf4,stroke:#22c55e
style D fill:#f3e8ff,stroke:#a855f7
style E fill:#d1fae5,stroke:#10b981
style F fill:#fef9c3,stroke:#eab308
style G fill:#f0fdf4,stroke:#22c55e
style H fill:#d1fae5,stroke:#10b981
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Purpose Limitation Engine |
Policy |
Enforce data purpose tags; block cross-purpose AI data flows |
OPA, custom policy engine, AWS Verified Permissions |
High |
| Consent Management Platform Integration |
Policy |
Gate AI processing on valid, granular consent; honour withdrawals |
OneTrust, Cookiebot, TrustArc, Didomi |
High |
| Data Minimisation Filter |
Processing |
Strip non-required fields from structured inputs before AI pipeline |
Custom Lambda/Function; Apache NiFi; dbt |
High |
| PII Detector |
Processing |
Named entity recognition for personal identifiers (multi-jurisdiction) |
AWS Comprehend, Azure Purview DLP, spaCy, Presidio |
Critical |
| Redaction Engine |
Processing |
Replace/pseudonymise PII in text; manage pseudonym mappings |
Microsoft Presidio, custom, AWS Macie |
Critical |
| Pseudonym Key Store |
Storage |
Encrypted mapping of pseudonyms to real values; per-context keys |
AWS Secrets Manager + DynamoDB, HashiCorp Vault + PostgreSQL |
Critical |
| Cross-Border Transfer Check |
Policy |
Assess destination jurisdiction vs data classification; block if restricted |
Custom policy engine; cloud-native geo-restriction |
High |
| Local Model Runtime |
Inference |
Execute inference on restricted data without cross-border transfer |
vLLM, Triton, Ollama (sovereign cloud deployment) |
High |
| Output PII Filter |
Processing |
Detect and redact PII in AI-generated responses before delivery |
AWS Comprehend, Azure Content Safety, Microsoft Presidio |
High |
| Vector Store with Delete Capability |
Storage |
Store embeddings; support per-document deletion for erasure requests |
Pinecone, Weaviate, pgvector, Chroma |
High |
| Machine Unlearning Scheduler |
Operations |
Track erasure obligations on model weights; schedule re-training |
Custom workflow, MLflow, SageMaker Pipelines |
Medium |
| Privacy Impact Assessment Trigger |
Governance |
Evaluate new AI projects against PIA criteria; trigger PIA if required |
Custom intake form, OneTrust PIA module |
High |
| Erasure Request Tracker |
Operations |
Track erasure request status across all AI data stores |
Custom database, ServiceNow, Jira |
Critical |
7. Data Flow
Primary Flow
| Step |
Actor |
Action |
Output |
| 1 |
AI Application |
Receive user input with data subject personal information |
Raw input containing potential PII |
| 2 |
Purpose Limitation Engine |
Check input data's purpose tags against AI workflow's declared purpose |
Permitted or denied |
| 3 |
Consent Management Integration |
Query CMP for valid consent for this specific AI processing activity |
Consent status: valid / invalid / not-required |
| 4 |
Data Minimisation Filter |
Remove fields not required for the declared AI purpose |
Minimised input |
| 5 |
PII Detector |
Scan minimised input for personal identifiers across all supported PII categories |
Annotated text with PII spans identified |
| 6 |
Redaction Engine |
Replace identified PII with redaction tokens or pseudonyms; record pseudonym map |
Sanitised input text safe for AI processing |
| 7 |
Cross-Border Transfer Check |
Assess data residency requirement vs intended inference endpoint |
Route to local model or third-party API |
| 8 |
AI Inference Engine |
Execute inference on sanitised input |
AI response (may contain pseudonyms or redacted tokens) |
| 9 |
Output PII Filter |
Scan AI response for PII that may have been generated or leaked |
Clean AI response |
| 10 |
Pseudonym Restore (if authorised) |
Map pseudonyms back to real values for authorised consumers |
Full response for authorised downstream systems |
| 11 |
Inference Log |
Record pseudonymised audit record of the interaction |
Immutable audit entry with no real PII |
Error Flow
| Step |
Failure |
Detection |
Recovery |
| PII Detection Miss |
PII slips through detector into LLM API call |
Output audit; third-party API terms monitoring |
Log incident; improve detector; assess whether third-party retained data; notify DPA if GDPR Article 33 threshold met |
| Cross-Border Block with No Local Fallback |
Transfer blocked; no local model available |
Block event logged; request fails |
Graceful error returned to user; alert operations; provision local model |
| Consent Withdrawal Not Honoured |
Withdrawn consent subject continues to have data processed |
Compliance audit; data subject complaint |
Immediately halt processing; purge from processing queue; erasure assessment |
| Erasure Missed in Vector Store |
Vectors derived from deleted subject remain in RAG index |
Post-erasure verification scan |
Delete missed vectors; update erasure request record; document as near-miss |
| Pseudonym Map Key Loss |
Key store failure makes pseudonyms irreversible |
Key store availability monitoring |
Restore from HSM-backed key backup; assess whether any requests require restoration |
8. Security Considerations
Privacy and Security Controls
| Domain |
Control |
Implementation |
Privacy Obligation |
| Authentication |
PII pipeline components require service authentication; no anonymous access to pseudonym maps |
IAM service accounts; mutual TLS |
APP 11; GDPR Article 32 |
| Authorisation |
Only authorised consumers can invoke pseudonym restore; access logged |
RBAC on pseudonym restore API; audit log |
APP 11; GDPR Article 32 |
| Secrets |
Pseudonymisation keys stored in HSM-backed secrets manager; rotated annually |
AWS KMS, Azure Key Vault, HashiCorp Vault |
APP 11 |
| Classification |
All AI assets containing personal data classified; controls applied per classification |
Data classification policy; automated tagging |
APP 11; GDPR Article 32 |
| Encryption |
All personal data encrypted in transit (TLS 1.3) and at rest (AES-256 CMEK) |
Cloud-native encryption with CMEK |
APP 11; GDPR Article 32 |
| Auditability |
All PII processing events logged immutably; erasure actions logged |
Immutable audit store; retention per regulation |
APP 11; GDPR Article 5(2) accountability |
OWASP LLM Top 10 — Privacy Mapping
| OWASP LLM Risk |
Privacy Impact |
Control |
Privacy Law Reference |
| LLM01 Prompt Injection |
Adversary extracts personal data from AI context window via injection |
Input sanitisation; context window size limits; PII redaction before context insertion |
APP 11; GDPR Article 32 |
| LLM02 Insecure Output Handling |
AI response contains personal data passed to unauthorised downstream systems |
Output PII filter; authorisation on pseudonym restore; schema validation |
APP 11; GDPR Article 32 |
| LLM03 Training Data Poisoning |
Malicious PII injection into training data creates privacy risk in model |
Training data governance; PII screening of training corpus; access control on training pipelines |
APP 10; GDPR Article 5(1)(f) |
| LLM04 Model Denial of Service |
Availability failure prevents erasure requests being processed in time |
High availability for erasure request processing; SLA on erasure fulfilment |
APP 13; GDPR Article 17 |
| LLM05 Supply Chain Vulnerabilities |
Third-party AI components process personal data without adequate controls |
Vendor DPA; cross-border transfer mechanism; supply chain privacy assessment |
APP 8; GDPR Article 28 |
| LLM06 Sensitive Information Disclosure |
AI reveals personal data from training set or context window |
Output PII filter; differential privacy in training; training data audit |
APP 11; GDPR Article 9 (special categories) |
| LLM07 Insecure Plugin Design |
Tool calls made by AI agent exfiltrate personal data to unauthorised systems |
Tool call whitelisting; data egress controls; purpose limitation on tool outputs |
APP 6 (use/disclosure); GDPR Article 5(1)(b) purpose limitation |
| LLM08 Excessive Agency |
Autonomous AI agent processes or shares personal data beyond authorised scope |
Scope limits on agent data access; human approval for data-sharing actions |
APP 6; GDPR Article 22 |
| LLM09 Overreliance |
AI-generated personal data inferences treated as fact; errors in personal records |
Confidence scoring; human review for decisions affecting individuals |
APP 13 (correction); GDPR Article 16 (rectification) |
| LLM10 Model Theft |
Stolen model may allow extraction of training data containing personal data |
Model access control; model weight encryption; training data minimisation |
APP 11; GDPR Article 32 |
9. Governance Considerations
Privacy Governance
| Domain |
Requirement |
Owner |
Frequency |
| Privacy Impact Assessment |
Conduct PIA for AI projects meeting trigger criteria (new technology, large-scale processing, sensitive data) |
Privacy Officer / DPO |
At project inception; re-assess on material change |
| Records of Processing |
GDPR Article 30 records updated for each AI processing activity; APP 1 privacy policy updated |
Privacy Officer |
On each new AI deployment |
| Lawful Basis Documentation |
Document and review lawful basis for each AI processing activity |
Legal / Privacy |
Annual review; on change of purpose |
| Data Subject Rights SLA |
Erasure requests fulfilled within 30 days (GDPR) / reasonable time (APP); access requests within 30 days |
Operations + Privacy |
Per-request tracking |
| De-identification Assessment |
Re-identification risk assessment for data treated as anonymous or de-identified |
Privacy Officer |
Annual; on technique change |
| Third-Party Privacy Assessment |
Vendor DPA and transfer mechanism in place before AI vendor processes personal data |
Procurement + Privacy |
On vendor onboarding; annual review |
Governance Artefacts
| Artefact |
Description |
Retention |
| Privacy Impact Assessments |
Documented PIA for each AI system processing personal data |
Lifetime of system + 7 years |
| Records of Processing Activities |
GDPR Article 30 register of all AI processing activities |
Ongoing; current state + 7 years history |
| Consent Records |
Evidence of valid consent obtained for consent-based AI processing |
7 years after consent expires or is withdrawn |
| Erasure Request Log |
Full audit trail of each erasure request, sub-tasks completed, and residual limitations |
7 years |
| Cross-Border Transfer Records |
DPAs, SCCs, adequacy decisions for each third-country AI provider |
Duration of transfer + 7 years |
| PII Redaction Audit Logs |
Log of all PII detections and redactions applied in the AI pipeline |
3 years (operational) |
10. Operational Considerations
Monitoring and SLOs
| SLO |
Target |
Measurement |
Breach Action |
| PII Redaction Coverage |
>99.5% of PII spans detected before LLM call |
Audit sample: 1% of inputs re-scanned post-redaction |
Alert privacy engineering; model retrain if detection rate drops |
| Erasure Request Fulfilment |
100% within 25 days (5-day buffer before GDPR 30-day limit) |
Days from receipt to confirmed completion |
Escalate to Privacy Officer; daily tracking |
| Consent Check Latency |
<50ms added to inference pipeline |
P99 latency of consent check API call |
Investigate CMP performance; implement caching |
| Cross-Border Block Rate |
0 unintended cross-border transfers detected |
Egress monitoring; transfer policy check log |
Immediate investigation; potential DPA notification |
| Purpose Limitation Violations |
0 per month |
Policy engine block events classified as violations |
Root cause analysis; architecture review |
Disaster Recovery
| Scenario |
Privacy Impact |
Recovery |
| PII Detector Outage |
Unredacted PII may be sent to AI models during outage |
Halt AI inference pipeline; restore detector; assess any exposures during outage |
| Pseudonym Key Store Failure |
Cannot restore pseudonyms for authorised consumers |
Restore from HSM backup; no data loss if backup current; assess impact on in-flight requests |
| Consent Management Platform Unavailability |
Cannot verify consent for new processing requests |
Fail closed: block AI processing that requires consent verification until CMP restored |
| Vector Store Corruption |
Cannot fulfil erasure requests until store restored |
Restore from backup; re-run any erasure requests that were in-flight during corruption |
11. Cost Considerations
Cost Drivers
| Cost Driver |
Indicative Cost |
Notes |
| PII Detection Platform |
USD 3,000–15,000/month |
Scales with text volume; cloud-native services have per-character pricing |
| Pseudonymisation Infrastructure |
USD 1,000–5,000/month |
Key management + lookup store; near-flat scaling |
| Consent Management Platform |
USD 2,000–20,000/month |
Scales with site traffic and consent complexity |
| Local Model Hosting (for sovereign inference) |
USD 10,000–50,000/month |
GPU compute; varies significantly by model size and throughput |
| Erasure Request Processing |
USD 500–3,000/month |
Engineering time + automated tooling; spikes with high erasure volume |
| Privacy Engineering FTE |
USD 180,000–350,000/year |
1–2 FTE with privacy engineering + AI experience |
| Privacy Impact Assessments |
USD 10,000–30,000 per major AI project |
Legal + privacy officer time; external review |
Indicative Cost Range
| Organisation |
Annual Privacy-Preserving AI Cost |
Notes |
| Small (1–3 AI products, low PII volume) |
USD 200,000–500,000 |
Cloud-native tooling; no local model required |
| Mid-size (4–10 AI products, medium PII) |
USD 700,000–2,000,000 |
Local model for sovereign inference; dedicated privacy engineering |
| Large (10+ AI products, high PII volume, multi-jurisdiction) |
USD 3,000,000–10,000,000 |
Enterprise CMP; full PII pipeline at scale; machine unlearning programme |
12. Trade-Off Analysis
Architecture Options
| Option |
Description |
Pros |
Cons |
Recommended For |
| Option A: Redact Before Cloud LLM |
Send only redacted/pseudonymised text to cloud LLM APIs |
Leverages best LLM capability; lower local compute cost |
Redaction reduces AI response quality for name-dependent tasks; pseudonym complexity |
Most use cases where AI does not need actual PII values |
| Option B: Sovereign Local Model Only |
All inference on local/sovereign cloud models; no PII leaves jurisdiction |
Maximum data control; no cross-border transfer concerns |
Higher compute cost; local models may have lower capability |
Highly regulated industries; sovereign data requirements; government |
| Option C: Federated Inference |
Model is federated to user devices; no personal data centralised |
PII never leaves user device |
Very limited model capability; complex deployment; not suitable for server-side AI |
Consumer mobile AI on highly sensitive health or financial data |
Architectural Tensions
| Tension |
Trade-Off |
Resolution |
| Redaction Completeness vs AI Response Quality |
Aggressive redaction improves privacy but degrades AI output relevance |
Use pseudonymisation (not full redaction) where AI needs semantic continuity; restore only at authorised output |
| Purpose Limitation vs AI Model Training |
Using production user data to improve AI models serves product interest but may violate original collection purpose |
Explicit consent for training use; or use synthetic data generated from production patterns |
| Machine Unlearning Completeness vs Operational Feasibility |
Full unlearning from model weights is computationally expensive |
Tiered approach: immediate erasure from logs/vectors (automated); model weight unlearning deferred and batched; documented justification under GDPR Recital 65 |
| Consent Granularity vs User Experience |
Highly granular consent improves legal basis but creates consent fatigue |
Use layered consent with clear explanations; default to most privacy-protective option |
13. Failure Modes
| Failure |
Likelihood |
Impact |
Detection |
Recovery |
| PII Sent to Third-Party LLM API Unredacted |
Medium |
High — potential GDPR Article 33 breach notification obligation |
Output audit; third-party DPA breach notification |
Invoke breach response plan; assess exposure; notify DPA if threshold met (72 hours) |
| Erasure Request Missed in Vector Store |
High |
Medium — GDPR Article 17 breach; ICO/OAIC investigation |
Post-erasure verification scan |
Delete missed vectors; document near-miss; improve verification process |
| Purpose Creep — AI Model Trained on Consent-Gated Data |
Medium |
High — consent basis undermined; enforcement action |
Data governance audit; purpose tag mismatch alert |
Remove contaminated training data; re-train without; notify affected data subjects |
| Pseudonym Map Loss |
Low |
High — cannot restore data for authorised consumers; erasure verification impaired |
Key store health monitoring |
Restore from HSM backup; incident report |
| Cross-Border Transfer Without Mechanism |
Medium |
Critical — GDPR Article 46 breach; transfer suspension |
Egress monitoring; legal review |
Halt transfer; implement SCC retroactively; assess whether DPA notification required |
Cascading Failure Scenario
A new AI feature is deployed with RAG indexing of customer emails. The PIA was skipped because the feature was deemed "low risk" by the product team. Customer emails contain health information (sensitive personal data under GDPR Article 9). The RAG vector store creates embeddings of this health information. These embeddings are sent as context to a US-based LLM API with no GDPR Standard Contractual Clauses in place—a cross-border transfer without a valid transfer mechanism. A data subject files an erasure request; the organisation cannot identify the vectors derived from the subject's emails because the vector store has no per-document deletion capability and no document-to-vector mapping. The DPA investigation finds three simultaneous breaches: special category processing without explicit consent (Article 9), cross-border transfer without mechanism (Article 46), and inability to fulfil erasure right (Article 17). Fine issued; AI service suspended.
14. Regulatory Considerations
| Regulation |
Specific Obligation |
Architectural Control |
Reference |
| Privacy Act 1988 APP 3 |
Collection of personal information only for lawful, notified purpose |
Lawful basis assessment; purpose documentation before AI deployment |
APP 3.1, APP 3.3 |
| Privacy Act 1988 APP 6 |
Use or disclosure only for primary purpose or compatible secondary purpose |
Purpose Limitation Engine; purpose tags on data assets |
APP 6.1, APP 6.2 |
| Privacy Act 1988 APP 8 |
Cross-border disclosure must ensure equivalent privacy protection |
Cross-Border Transfer Check; DPA + accountability mechanism |
APP 8.1 |
| Privacy Act 1988 APP 11 |
Reasonable steps to protect personal information from misuse, interference, loss, unauthorised access |
PII redaction pipeline; encryption; access controls |
APP 11.1 |
| Privacy Act 1988 APP 13 |
Correct or destroy personal information on request |
Erasure Request Tracker; vector store delete; machine unlearning |
APP 13.1 |
| GDPR Article 5 |
Data minimisation, purpose limitation, storage limitation, integrity and confidentiality |
Data Minimisation Filter; Purpose Limitation Engine; retention policies |
GDPR Article 5(1)(b)(c)(e)(f) |
| GDPR Article 6 |
Lawful basis for processing personal data |
Lawful basis documentation; consent check integration |
GDPR Article 6(1) |
| GDPR Article 17 |
Right to erasure ('right to be forgotten') |
Erasure Request Tracker; vector delete; log delete; machine unlearning |
GDPR Article 17 |
| GDPR Article 20 |
Right to data portability |
Portability export module for AI-processed personal data |
GDPR Article 20 |
| GDPR Article 25 |
Privacy by Design and by Default |
Architecture is inherently privacy-preserving; default is most restrictive |
GDPR Article 25 |
| GDPR Article 35 |
Data Protection Impact Assessment for high-risk processing |
PIA Trigger engine; DPIA process integration |
GDPR Article 35(3) |
15. Reference Implementations
AWS
| Component |
AWS Service |
| PII Detection |
Amazon Comprehend (custom entity recogniser for Australian PII) |
| Redaction Pipeline |
AWS Lambda + Comprehend PII entity detection |
| Pseudonym Key Store |
AWS Secrets Manager (keys) + DynamoDB Encrypted (mappings) |
| Cross-Border Transfer Control |
VPC endpoint geo-restriction + SCPs preventing data to non-approved regions |
| Local Inference |
Amazon Bedrock (Sydney region) or EC2 G-series with vLLM |
| Vector Store with Delete |
Amazon OpenSearch Service (with per-document delete) |
| Consent Integration |
AWS Lambda → CMP API; cached in ElastiCache |
| Erasure Tracker |
DynamoDB + Step Functions orchestration |
Azure
| Component |
Azure Service |
| PII Detection |
Azure AI Language PII detection + custom recognisers |
| Redaction Pipeline |
Azure Function + AI Language SDK |
| Pseudonym Key Store |
Azure Key Vault (keys) + Azure Cosmos DB Encrypted (mappings) |
| Cross-Border Transfer Control |
Azure Policy geo-restriction + Private Link |
| Local Inference |
Azure OpenAI (Australia East) or Azure ML on Azure Government |
| Vector Store with Delete |
Azure AI Search (with per-document delete) or pgvector |
| Consent Integration |
Azure Function → OneTrust/TrustArc API |
| Erasure Tracker |
Azure Cosmos DB + Azure Logic Apps |
GCP
| Component |
GCP Service |
| PII Detection |
Cloud DLP (Data Loss Prevention) API with AU infotypes |
| Redaction Pipeline |
Cloud Functions + Cloud DLP |
| Pseudonym Key Store |
Secret Manager (keys) + Firestore Encrypted (mappings) |
| Cross-Border Transfer Control |
VPC Service Controls + Organisation Policy for resource location |
| Local Inference |
Vertex AI (Australia Southeast region) |
| Vector Store with Delete |
Vertex AI Vector Search or Weaviate on GKE |
| Consent Integration |
Cloud Functions → CMP API; Memorystore cache |
| Erasure Tracker |
Firestore + Cloud Workflows |
On-Premises
| Component |
Technology |
| PII Detection |
Microsoft Presidio (open source); spaCy with custom NER models |
| Redaction Pipeline |
Apache NiFi with custom processors |
| Pseudonym Key Store |
HashiCorp Vault (keys) + PostgreSQL encrypted (mappings) |
| Cross-Border Transfer Control |
Network firewall egress rules; air-gapped inference environment |
| Local Inference |
Ollama or vLLM on bare-metal GPU servers |
| Vector Store with Delete |
Weaviate self-hosted; pgvector with row-level delete |
| Consent Integration |
Custom REST API to CMP; Redis cache |
| Erasure Tracker |
PostgreSQL + Temporal workflow engine |
| Pattern ID |
Pattern Name |
Relationship |
Notes |
| EAAPL-CMP002 |
APRA CPS234 AI Security |
COMPLEMENTARY |
CPS234 security controls protect personal data at rest and in transit; deploy alongside |
| EAAPL-CMP003 |
EU AI Act Compliance |
COMPLEMENTARY |
GDPR obligations are prerequisites for EU AI Act Article 10 data governance compliance |
| EAAPL-CMP007 |
Data Residency for AI |
PREREQUISITE |
Must establish residency controls before cross-border transfer check is implementable |
| EAAPL-CMP008 |
GDPR-Compliant AI |
EXTENSION |
GDPR-Compliant AI pattern extends this pattern with GDPR-specific Article 22 automated decisions |
| EAAPL-AGT003 |
Human-in-the-Loop Oversight |
COMPLEMENTARY |
HITL oversight is required for AI decisions with significant privacy impact on individuals |
| EAAPL-PLT005 |
AI Data Governance |
PREREQUISITE |
Data classification and purpose tagging must be operational before this pattern can be deployed |
17. Maturity Assessment
Overall Maturity Label: Proven
| Dimension |
Level 1 |
Level 2 |
Level 3 |
Level 4 |
Level 5 |
Current Level |
| PII Detection |
No detection |
Manual review |
Automated NER; known PII categories |
Multi-jurisdiction PII; custom entity models; audit sampling |
Near-zero miss rate; continuous model improvement |
Level 3–4 |
| Purpose Limitation |
No controls |
Documented policy only |
Policy engine; purpose tags enforced |
Real-time violation detection; automated blocking |
ML-powered purpose inference for unlabelled data |
Level 3 |
| Erasure Handling |
Cannot fulfil erasure |
Manual deletion from primary DB |
Automated deletion from logs and vectors |
Machine unlearning capability for model weights |
Complete erasure including model weights; <7 day SLA |
Level 3 |
| Cross-Border Controls |
No controls |
Ad-hoc SCC review |
Automated transfer check; block or route |
Real-time jurisdiction monitoring |
Continuous adequacy monitoring; instant reroute |
Level 3 |
| Privacy Governance |
No PIA process |
PIAs conducted occasionally |
PIAs triggered for all qualifying AI projects |
Privacy metrics in AI project KPIs |
Privacy risk automated into CI/CD pipeline |
Level 3 |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2025-04-01 |
EAAPL Working Group |
Initial draft |
| 1.1 |
2025-07-20 |
EAAPL Working Group |
Added GDPR Article 22 automated decisions detail; expanded cross-border section |
| 1.2 |
2025-10-05 |
EAAPL Working Group |
Added machine unlearning section; updated Australian PII categories |
| 1.3 |
2026-02-15 |
EAAPL Working Group |
Added OWASP LLM Top 10 privacy mapping; cascading failure scenario |
| 1.4 |
2026-06-12 |
EAAPL Working Group |
Updated cost ranges; added federated inference option; aligned with Privacy Act 2024 amendments |