EAAPL-GOV007Proven

AI Audit Trail

⚖️ AI GovernanceAPRA CPS230APRA CPS234🏭 Field-tested in AU

[EAAPL-GOV007] AI Audit Trail

Category: Governance / Compliance & Traceability Sub-category: Immutable Decision Logging Version: 2.0 Maturity: Mature Tags: audit-trail, immutable-log, decision-traceability, WORM, tamper-evident, regulatory-retention Regulatory Relevance: APRA CPS230 §32, APRA CPS234 §19, EU AI Act Article 12, Privacy Act APP 11, ISO/IEC 42001 §9.1, NIST AI RMF MANAGE 4.1

1. Executive Summary

The AI Audit Trail pattern establishes an immutable, tamper-evident log of every AI decision made by enterprise AI systems. It captures the full decision context: sanitised input, model version, prompt version (for generative AI), output, confidence score, policy decisions applied, human overrides, and data sources consumed. Retention is aligned to regulatory requirements—7 years for APRA-regulated entities, which exceeds standard application logging.

This pattern is foundational to enterprise AI governance. Without an audit trail, every governance claim is unverifiable: you cannot prove a policy was applied, you cannot reconstruct what a model said to a customer, you cannot demonstrate human oversight occurred, and you cannot respond to regulatory requests for decision evidence. The EU AI Act Article 12 makes logging mandatory for high-risk AI systems; APRA CPS234 requires evidence of information security controls.

Beyond compliance, the audit trail enables capability that pure governance cannot: post-incident forensics to understand exactly what an AI system did; statistical analysis of decision patterns to detect emerging bias; accountability assignment when AI decisions are disputed; and model performance benchmarking against actual outcomes.

The pattern's defining architectural commitment is immutability. Write-once, read-many (WORM) storage with cryptographic integrity verification ensures that audit records cannot be modified, deleted, or backdated. This is not a logging best practice—it is a regulatory requirement for any organisation claiming its AI audit trail as compliance evidence.

2. Problem Statement

Business Problem

When AI decisions are disputed—by customers, regulators, or in legal proceedings—organisations cannot reconstruct what the AI system actually said, what data it used, and what policies governed it. Standard application logs are insufficient: they are not retained long enough, they are not structured for decision reconstruction, and they are not tamper-proof.

Technical Problem

AI decisions have richer context than standard API calls: model version, prompt version, retrieved context (for RAG systems), token-level confidence, policy decisions from GOV004, human override events from GOV005, and ground-truth outcome feedback. Standard logging infrastructure cannot capture this structured, multi-source context and retain it in WORM format for 7 years.

Symptoms

Inability to reconstruct what an AI system told a specific customer on a specific date
No evidence that policy guardrails were applied to a flagged AI decision
Human override events not captured, preventing audit of human oversight effectiveness
AI decision logs retained for 90 days (standard app log retention) vs. 7-year regulatory requirement
Logs mutable by administrators, failing tamper-evidence requirements
No correlation between AI decision and downstream outcome (outcome feedback not captured)

Cost of Inaction

Regulatory: APRA CPS234 §19 non-compliance (information security evidence); EU AI Act Article 12 (logging obligation)
Legal: Inability to defend AI decisions in legal proceedings due to absence of evidence
Operational: Post-incident forensics impossible; root cause analysis speculative
Governance: Human oversight claimed but not evidenced; responsible AI controls unverifiable

3. Context

When to Apply

All AI systems making decisions affecting individuals in regulated industries
Any AI system subject to EU AI Act high-risk classification (Annex III)
AI systems in APRA-regulated entities (all, per CPS234 information security obligations)
Any AI system where post-incident forensics capability is required
Generative AI systems producing customer-facing outputs (financial advice, healthcare recommendations)

When NOT to Apply

Internal AI systems with no customer or regulatory exposure and sub-30-day retention acceptable
Ultra-high-volume, low-consequence AI (e.g., real-time recommendation clickthrough) where full decision logging is cost-prohibitive — use sampled logging with defined sampling strategy

Prerequisites

Information classification scheme for AI inputs and outputs
WORM-capable storage infrastructure
Identity context available at inference time (user ID, session ID)
Model register (GOV001) operational — MRID required in each log entry

Industry Applicability

Industry	Retention Requirement	Key Log Fields	Primary Driver
Banking (AU)	7 years	Decision, confidence, policy applied, human override	APRA CPS230 §32
Insurance (AU)	7 years	Underwriting decision, rating factors, model version	APRA record-keeping
Healthcare	7 years (clinical)	Clinical recommendation, clinical model version, clinician override	Health Records Act
Financial Services (EU)	10 years (MiFID II)	Investment recommendation, AI model version, disclaimer applied	MiFID II Article 25; EU AI Act Article 12
Government	7–30 years (varies)	Administrative decision, AI contribution, human decision	Archives Act; administrative law

4. Architecture Overview

The AI Audit Trail is architected around two non-negotiable properties: completeness and immutability. Completeness means every decision is captured with sufficient context to reconstruct the decision scenario. Immutability means no record can be modified or deleted once written, including by system administrators.

Decision Record Schema. The audit record schema is the most critical design decision in this pattern. It must be rich enough to satisfy regulatory reconstruction requirements while not so voluminous that storage costs become prohibitive at scale. The schema stratifies into four payload tiers:

Mandatory Tier (always captured): Decision ID (UUID), MRID (model version, architecture hash), timestamp (UTC, nanosecond precision), actor identity (user ID, session ID, application context), input fingerprint (cryptographic hash of sanitised input — not the raw input, which may contain PII), output summary (sanitised, PII-free summary of the model output), decision type (classification/recommendation/generation), confidence score, latency, policy decision reference (GOV004), and regulatory context flags.

Decision Tier (for consequential decisions): Full decision rationale (for explainable models), counterfactual summary, data sources referenced (document IDs for RAG systems, feature values for tabular models), fairness context (demographic group if known), and human oversight indicator.

Override Tier (conditional — only when human override occurs): Override actor identity, override timestamp, override rationale, original AI decision preserved, override decision, and escalation reference.

Outcome Tier (populated retrospectively): Ground truth outcome, outcome timestamp, outcome source, accuracy flag. This tier enables model performance analysis against real-world outcomes and is critical for GOV006 bias detection (equalised odds requires ground truth).

Immutability Architecture. The immutability guarantee is implemented at multiple layers to prevent single points of bypass. First, the audit log writer is the only component with write access to the log store—application code cannot write directly. Second, the log store is configured as WORM (Write Once Read Many) with Compliance mode (not Governance mode), meaning even the storage administrator cannot delete or modify records during the retention period. Third, each record is written with a cryptographic hash of its content, enabling integrity verification without trusting the storage layer alone. Fourth, a Merkle-tree-based tamper-evidence chain links records so that any retrospective modification is detectable from the chain.

PII Sanitisation at Write Time. Raw AI inputs often contain personal information. The audit trail must preserve decision context without creating a 7-year PII retention risk in violation of Privacy Act obligations. The sanitisation pipeline, executed before write, applies: named entity recognition to identify PII fields, substitution of PII values with entity type tokens (e.g., [PERSON], [EMAIL], [ACCOUNT_NUMBER]), preservation of non-PII decision-relevant features, and a cryptographic binding between the sanitised record and the original request (so the original can be retrieved under legal process if required, from the primary application system which has appropriate retention).

Retention Tiering for Cost Management. Seven-year retention at full fidelity is cost-prohibitive for high-volume systems. The pattern implements tiered retention: hot tier (0–90 days, full queryable index, high-speed retrieval), warm tier (90 days–2 years, reduced query access, compressed storage), cold tier (2–7 years, compliance archive, retrieval SLA 24 hours, minimum cost storage). The WORM guarantee applies across all tiers.

Query Architecture. The audit trail serves two query patterns with different characteristics: operational queries (find all decisions for customer X in the past 30 days — requires fast index on user ID and date) and forensic queries (reconstruct decision state for model version Y on date Z — requires full scan with filter, less latency-sensitive). The pattern implements a search index over the hot and warm tiers for operational queries, with direct S3/blob scanning available for forensic queries.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Sources["Event Sources"] A[AI Model + Policy Engine] B[Human Override Events] end subgraph Pipeline["Write Pipeline"] C[Audit Log Writer] D[PII Sanitise + Hash] end subgraph Storage["Tiered WORM Storage"] E[(Hot Tier 0-90 days)] F[(WORM Archive 2-7 years)] G[Integrity Verifier] end subgraph Query["Query and Reporting"] H[RBAC Access Gate] I[Regulatory Reports] J[Tamper Alert] end A -->|decision event| C B -->|override event| C C --> D D --> E D --> F E -->|operational query| H F --> G G -->|tamper detected| J H --> I style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#fef9c3,stroke:#eab308 style F fill:#fef9c3,stroke:#eab308 style G fill:#f0fdf4,stroke:#22c55e style H fill:#f3e8ff,stroke:#a855f7 style I fill:#d1fae5,stroke:#10b981 style J fill:#fee2e2,stroke:#ef4444

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Audit Log Writer	Application Service	Single write pathway; accepts events from all sources; enforces schema	FastAPI service, gRPC service	Critical
PII Sanitisation Pipeline	Data Processing	Named entity recognition; PII token substitution before write	Microsoft Presidio, custom spaCy pipeline	Critical
Record Hash & Merkle Chain	Security Control	SHA-256 hash per record; Merkle chain for tamper-evidence	Custom implementation (well-specified)	Critical
WORM Blob Store	Compliance Storage	Primary immutable long-term storage; 7-year WORM	AWS S3 Object Lock (Compliance), Azure Immutable Blob, Worm-compliant NAS	Critical
Hot Tier Store	Operational Storage	Fast queryable store for recent decisions	PostgreSQL (append-only trigger), OpenSearch	High
Search Index	Query Acceleration	Full-text + faceted search for operational queries	OpenSearch, Elasticsearch, Azure Cognitive Search	High
Integrity Verifier	Scheduled Job	Daily Merkle chain verification; detects tampering	Custom Python service + AWS Lambda	Critical
RBAC Access Gate	Security Control	Enforces role-based access to audit records	API Gateway + OAuth RBAC; OPA policy	Critical
Lifecycle Manager	Operations	Manages hot→warm→cold transitions per retention schedule	AWS S3 Lifecycle, Azure Blob lifecycle	Medium
Regulatory Export Service	Compliance	Generates structured evidence packages for regulatory submissions	Custom export API with APRA/EU formats	High

7. Data Flow

Primary Audit Record Write Flow

Step	Actor	Action	Output
1	AI Engine / PEP / Override Gateway	Emits structured decision event to Audit Log Writer	Event payload
2	Audit Log Writer	Authenticates event source; validates event type	Source-authenticated event
3	PII Sanitisation Pipeline	Scans input and output fields; replaces PII with entity tokens; preserves cryptographic binding	Sanitised event
4	Schema Validator	Validates mandatory fields; enforces taxonomy values	Validated event with schema version
5	Record Hash	Computes SHA-256 of record content; links to previous Merkle chain root	Record hash + chain link
6	WORM Write	Writes record to hot tier (PostgreSQL/OpenSearch) AND WORM blob simultaneously	Record written with sequence ID
7	Write Confirmation	Returns success ACK to event source	Durability confirmed

Regulatory Query Flow

Step	Actor	Action	Output
1	Compliance Officer	Submits regulatory evidence request (customer ID, date range, model)	Query ticket
2	RBAC Gate	Validates requester has Compliance role; authorises query	Authorised query
3	Search Index	Executes faceted query over hot + warm tiers	Matching record set
4	Regulatory Export Service	Formats records per submission format; includes integrity evidence	Evidence package (PDF + JSON + Merkle proof)

8. Security Considerations

Immutability Enforcement Layers

Application layer: Audit Log Writer is only write path; no other service has write credentials
Database layer: PostgreSQL append-only enforced via trigger (no UPDATE/DELETE permitted)
Storage layer: S3 Object Lock Compliance mode — storage administrators cannot override retention
Verification layer: Daily Merkle chain verification detects any tampering at any layer

Access Control

Read access requires Audit Reader role (minimum); Compliance role for full record access; Legal role for original pre-sanitised data (requires court order workflow)
All reads logged (who read which records, when) — audit of the audit
No bulk export without specific Compliance Director approval

OWASP LLM Top 10 Mapping

OWASP LLM Risk	Audit Trail Coverage	Log Field
LLM01 Prompt Injection	Log policy enforcement decision for injections	policy_decision.injection_detected
LLM02 Insecure Output Handling	Log output validator result	output_validation.result
LLM06 Sensitive Information Disclosure	Log PII sanitisation applied	pii_sanitisation.entities_redacted
LLM08 Excessive Agency	Log action scope vs approved scope	policy_decision.action_scope_check
LLM09 Overreliance	Log human override rate	override.occurred, override.actor

9. Governance Considerations

Retention Policy Governance

Retention periods are set by Legal + Compliance, not by technology teams. Different model use cases may have different retention requirements. The retention policy table is version-controlled and reviewed annually.

Governance Artefacts

Artefact	Owner	Frequency	Regulatory Linkage
Audit Trail Integrity Report	CISO	Monthly	APRA CPS234 §19
Regulatory Evidence Package	Compliance	Per request	APRA examinations; court orders
Retention Policy Compliance Report	Legal	Annually	Privacy Act APP 11; Archives Act
Override Activity Report	RAI Officer	Quarterly	EU AI Act Article 14
Decision Volume Report	AI Governance	Monthly	ISO 42001 §9.1

10. Operational Considerations

SLOs

SLO	Target	Measurement
Write latency p99	<50ms	Per write event
Write availability	99.99%	30-day rolling
Operational query latency p95	<5 seconds	Per query
Forensic query completion	<24 hours	Per forensic request
Integrity verification	Daily completion	Per daily run

Disaster Recovery

Scenario	RTO	RPO	Recovery
Hot tier database failure	15 minutes	0 (WORM blob is parallel primary)	Rebuild hot tier index from WORM blob
Write path unavailable	Circuit breaker: event queue buffers for 15 minutes	0	Writes resume from queue on recovery
WORM blob region failure	24 hours (cold restoration)	0 (replicated)	Cross-region replication pre-configured

11. Cost Considerations

Cost Drivers

Driver	Cost Type	7-Year Cost Estimate
WORM blob storage	Variable — per GB	At 1TB/year growth: 28TB × $23/TB/mo = AUD $7,700/yr at year 7
Hot tier database	Fixed compute	AUD $5,000–$20,000/yr
Search index	Fixed compute	AUD $8,000–$25,000/yr
Integrity verifier	Minimal compute	AUD $500/yr
PII sanitisation	Compute per event	$0.001–$0.01 per 1,000 events depending on complexity

Indicative Total Annual Cost

Scale	Events/Day	Annual Infrastructure	7-Year Total
Small (100K/day)	100,000	AUD $15,000	AUD $105,000
Medium (1M/day)	1,000,000	AUD $45,000	AUD $315,000
Large (10M/day)	10,000,000	AUD $120,000	AUD $840,000

12. Trade-Off Analysis

Option Comparison

Option	Description	Pros	Cons	Recommended For
A: WORM audit trail (this pattern)	Immutable, tiered, cryptographically verified	Regulatory-grade; tamper-evident; 7-year retention	Cost; complexity	All regulated entities
B: Standard application logging (ELK)	Mutable logs in Elasticsearch	Simple; developers familiar	Mutable; insufficient retention; not WORM	Development environments only
C: Blockchain/DLT audit trail	Decentralised immutable ledger	Strong tamper-evidence	Very high cost; complexity; slow writes; overkill	Niche use cases requiring external verifiability
D: SaaS audit trail (Sysdig, Datadog)	Cloud SIEM with long retention	Managed; easy setup	Vendor lock-in; may not meet WORM requirements; data residency concerns	Non-regulated organisations

Architectural Tensions

Tension	Stance	Mitigation
PII retention vs. Audit completeness	PII sanitised at write; cryptographic binding to original	Legal process recovery pathway defined
Cost vs. Completeness	Tiered retention; sampled logging for ultra-high volume non-consequential AI	Sampling strategy must be documented and approved
Query performance vs. Immutability	Separate queryable hot tier; WORM as primary	Hot tier rebuilt from WORM on failure

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Write path failure causing missed records	Low	Critical — regulatory compliance gap	Write queue depth monitoring; ACK timeouts	Event queue with guaranteed delivery; replay from queue
PII sanitisation false negative (PII written to audit log)	Medium	High — privacy breach in audit log	Periodic audit of sanitised records; PII scanner on log samples	Re-sanitisation of affected records; Privacy Officer notification
Merkle chain gap (tamper indicator)	Very Low	Critical — evidence integrity challenged	Daily integrity verifier	Invoke incident response; preserve evidence; notify CISO
Retention policy misconfiguration (early deletion)	Low	Critical — regulatory evidence destroyed	Lifecycle policy monitoring; deletion alerts	Restore from replica; legal hold override for affected records

14. Regulatory Considerations

APRA CPS230

§32: Record-keeping obligations for APRA-regulated entities require retention of records related to material operations for 7 years. AI decision records for credit, insurance, superannuation decisions are material operation records.

APRA CPS234

§19: APRA-regulated entities must retain information security-relevant logs. AI decision logs containing policy enforcement decisions satisfy this obligation.

EU AI Act

Article 12: Logging capabilities for high-risk AI systems. Providers must ensure high-risk AI systems have automatic logging of events throughout lifetime. This pattern implements Article 12(1) and 12(2) requirements.
Article 12(4): For AI systems in Annex III categories related to critical infrastructure, public authorities, migration — logs must be kept for period specific to use case. Pattern implements configurable retention per use case.

Privacy Act 1988 / APPs

APP 11: Reasonable steps to protect personal information. PII sanitisation before writing to long-term audit log is the key control.
APP 12: Access to personal information. Audit records about an individual are accessible to them on request; search index supports this.

ISO/IEC 42001

§9.1: Monitoring and measurement of AI management system effectiveness. Audit trail provides the evidence base for effectiveness assessment.

15. Reference Implementations

AWS

Component	Service
WORM Storage	S3 Object Lock (Compliance mode) + Glacier for cold tier
Hot Tier	DynamoDB (append-only via condition expressions)
Search Index	OpenSearch Service
PII Sanitisation	Comprehend (PII detection) + Lambda
Integrity Verification	Lambda (scheduled)

Azure

Component	Service
WORM Storage	Azure Blob Storage (Immutable Blob, compliance lock)
Hot Tier	Cosmos DB (append-only via stored procedure)
Search Index	Azure Cognitive Search
PII Sanitisation	Azure AI Language (PII extraction)

On-Premises

Component	Technology
WORM Storage	NetApp SnapLock Compliance / EMC DataDomain Retention Lock
Hot Tier	PostgreSQL with append-only enforced via trigger
Search Index	Elasticsearch
PII Sanitisation	Microsoft Presidio (self-hosted)

Pattern	Relationship	Dependency Direction
EAAPL-GOV001 AI Model Register	Input — MRID in every audit record	GOV001 → GOV007
EAAPL-GOV004 AI Policy Enforcement	Input — policy decisions logged	GOV004 → GOV007
EAAPL-GOV005 Responsible AI Framework	Consumer — accountability chain stored here	GOV005 → GOV007
EAAPL-GOV006 Model Bias Detection	Consumer — fairness events stored here	GOV006 → GOV007
EAAPL-GOV008 AI Incident Management	Consumer — forensic queries during incidents	GOV008 → GOV007
EAAPL-CMP001 APRA CPS230	Satisfies — §32 record-keeping	GOV007 → CMP001
EAAPL-CMP003 EU AI Act	Satisfies — Article 12 logging	GOV007 → CMP003

17. Maturity Assessment

Overall Maturity: Mature (Level 4)

Dimension	Score (1–5)	Evidence
Immutability architecture	5	WORM + Merkle chain + daily verification
Schema completeness	5	Four-tier schema covering all regulatory requirements
PII sanitisation	4	NER-based; gap is high-precision sanitisation for novel entity types
Retention tiering	4	Three tiers defined; gap is automated legal hold override process
Query capability	4	Operational + forensic query patterns; gap is AI-assisted forensic analysis

18. Revision History

Version	Date	Author	Changes
1.0	2024-01-01	EAAPL Working Group	Initial publication
1.1	2024-06-01	EAAPL Working Group	Added Merkle chain tamper-evidence
1.2	2024-12-01	EAAPL Working Group	EU AI Act Article 12 mapping; retention tiering
2.0	2025-08-01	EAAPL Working Group	Full rewrite: four-tier schema; PII sanitisation architecture; APRA CPS230 §32 alignment

← Back to Library More AI Governance →