[EAAPL-GOV007] AI Audit Trail
Category: Governance / Compliance & Traceability Sub-category: Immutable Decision Logging Version: 2.0 Maturity: Mature Tags: audit-trail, immutable-log, decision-traceability, WORM, tamper-evident, regulatory-retention Regulatory Relevance: APRA CPS230 §32, APRA CPS234 §19, EU AI Act Article 12, Privacy Act APP 11, ISO/IEC 42001 §9.1, NIST AI RMF MANAGE 4.1
1. Executive Summary
The AI Audit Trail pattern establishes an immutable, tamper-evident log of every AI decision made by enterprise AI systems. It captures the full decision context: sanitised input, model version, prompt version (for generative AI), output, confidence score, policy decisions applied, human overrides, and data sources consumed. Retention is aligned to regulatory requirements—7 years for APRA-regulated entities, which exceeds standard application logging.
This pattern is foundational to enterprise AI governance. Without an audit trail, every governance claim is unverifiable: you cannot prove a policy was applied, you cannot reconstruct what a model said to a customer, you cannot demonstrate human oversight occurred, and you cannot respond to regulatory requests for decision evidence. The EU AI Act Article 12 makes logging mandatory for high-risk AI systems; APRA CPS234 requires evidence of information security controls.
Beyond compliance, the audit trail enables capability that pure governance cannot: post-incident forensics to understand exactly what an AI system did; statistical analysis of decision patterns to detect emerging bias; accountability assignment when AI decisions are disputed; and model performance benchmarking against actual outcomes.
The pattern's defining architectural commitment is immutability. Write-once, read-many (WORM) storage with cryptographic integrity verification ensures that audit records cannot be modified, deleted, or backdated. This is not a logging best practice—it is a regulatory requirement for any organisation claiming its AI audit trail as compliance evidence.
2. Problem Statement
Business Problem
When AI decisions are disputed—by customers, regulators, or in legal proceedings—organisations cannot reconstruct what the AI system actually said, what data it used, and what policies governed it. Standard application logs are insufficient: they are not retained long enough, they are not structured for decision reconstruction, and they are not tamper-proof.
Technical Problem
AI decisions have richer context than standard API calls: model version, prompt version, retrieved context (for RAG systems), token-level confidence, policy decisions from GOV004, human override events from GOV005, and ground-truth outcome feedback. Standard logging infrastructure cannot capture this structured, multi-source context and retain it in WORM format for 7 years.
Symptoms
- Inability to reconstruct what an AI system told a specific customer on a specific date
- No evidence that policy guardrails were applied to a flagged AI decision
- Human override events not captured, preventing audit of human oversight effectiveness
- AI decision logs retained for 90 days (standard app log retention) vs. 7-year regulatory requirement
- Logs mutable by administrators, failing tamper-evidence requirements
- No correlation between AI decision and downstream outcome (outcome feedback not captured)
Cost of Inaction
- Regulatory: APRA CPS234 §19 non-compliance (information security evidence); EU AI Act Article 12 (logging obligation)
- Legal: Inability to defend AI decisions in legal proceedings due to absence of evidence
- Operational: Post-incident forensics impossible; root cause analysis speculative
- Governance: Human oversight claimed but not evidenced; responsible AI controls unverifiable
3. Context
When to Apply
- All AI systems making decisions affecting individuals in regulated industries
- Any AI system subject to EU AI Act high-risk classification (Annex III)
- AI systems in APRA-regulated entities (all, per CPS234 information security obligations)
- Any AI system where post-incident forensics capability is required
- Generative AI systems producing customer-facing outputs (financial advice, healthcare recommendations)
When NOT to Apply
- Internal AI systems with no customer or regulatory exposure and sub-30-day retention acceptable
- Ultra-high-volume, low-consequence AI (e.g., real-time recommendation clickthrough) where full decision logging is cost-prohibitive — use sampled logging with defined sampling strategy
Prerequisites
- Information classification scheme for AI inputs and outputs
- WORM-capable storage infrastructure
- Identity context available at inference time (user ID, session ID)
- Model register (GOV001) operational — MRID required in each log entry
Industry Applicability
| Industry | Retention Requirement | Key Log Fields | Primary Driver |
|---|---|---|---|
| Banking (AU) | 7 years | Decision, confidence, policy applied, human override | APRA CPS230 §32 |
| Insurance (AU) | 7 years | Underwriting decision, rating factors, model version | APRA record-keeping |
| Healthcare | 7 years (clinical) | Clinical recommendation, clinical model version, clinician override | Health Records Act |
| Financial Services (EU) | 10 years (MiFID II) | Investment recommendation, AI model version, disclaimer applied | MiFID II Article 25; EU AI Act Article 12 |
| Government | 7–30 years (varies) | Administrative decision, AI contribution, human decision | Archives Act; administrative law |
4. Architecture Overview
The AI Audit Trail is architected around two non-negotiable properties: completeness and immutability. Completeness means every decision is captured with sufficient context to reconstruct the decision scenario. Immutability means no record can be modified or deleted once written, including by system administrators.
Decision Record Schema. The audit record schema is the most critical design decision in this pattern. It must be rich enough to satisfy regulatory reconstruction requirements while not so voluminous that storage costs become prohibitive at scale. The schema stratifies into four payload tiers:
Mandatory Tier (always captured): Decision ID (UUID), MRID (model version, architecture hash), timestamp (UTC, nanosecond precision), actor identity (user ID, session ID, application context), input fingerprint (cryptographic hash of sanitised input — not the raw input, which may contain PII), output summary (sanitised, PII-free summary of the model output), decision type (classification/recommendation/generation), confidence score, latency, policy decision reference (GOV004), and regulatory context flags.
Decision Tier (for consequential decisions): Full decision rationale (for explainable models), counterfactual summary, data sources referenced (document IDs for RAG systems, feature values for tabular models), fairness context (demographic group if known), and human oversight indicator.
Override Tier (conditional — only when human override occurs): Override actor identity, override timestamp, override rationale, original AI decision preserved, override decision, and escalation reference.
Outcome Tier (populated retrospectively): Ground truth outcome, outcome timestamp, outcome source, accuracy flag. This tier enables model performance analysis against real-world outcomes and is critical for GOV006 bias detection (equalised odds requires ground truth).
Immutability Architecture. The immutability guarantee is implemented at multiple layers to prevent single points of bypass. First, the audit log writer is the only component with write access to the log store—application code cannot write directly. Second, the log store is configured as WORM (Write Once Read Many) with Compliance mode (not Governance mode), meaning even the storage administrator cannot delete or modify records during the retention period. Third, each record is written with a cryptographic hash of its content, enabling integrity verification without trusting the storage layer alone. Fourth, a Merkle-tree-based tamper-evidence chain links records so that any retrospective modification is detectable from the chain.
PII Sanitisation at Write Time. Raw AI inputs often contain personal information. The audit trail must preserve decision context without creating a 7-year PII retention risk in violation of Privacy Act obligations. The sanitisation pipeline, executed before write, applies: named entity recognition to identify PII fields, substitution of PII values with entity type tokens (e.g., [PERSON], [EMAIL], [ACCOUNT_NUMBER]), preservation of non-PII decision-relevant features, and a cryptographic binding between the sanitised record and the original request (so the original can be retrieved under legal process if required, from the primary application system which has appropriate retention).
Retention Tiering for Cost Management. Seven-year retention at full fidelity is cost-prohibitive for high-volume systems. The pattern implements tiered retention: hot tier (0–90 days, full queryable index, high-speed retrieval), warm tier (90 days–2 years, reduced query access, compressed storage), cold tier (2–7 years, compliance archive, retrieval SLA 24 hours, minimum cost storage). The WORM guarantee applies across all tiers.
Query Architecture. The audit trail serves two query patterns with different characteristics: operational queries (find all decisions for customer X in the past 30 days — requires fast index on user ID and date) and forensic queries (reconstruct decision state for model version Y on date Z — requires full scan with filter, less latency-sensitive). The pattern implements a search index over the hot and warm tiers for operational queries, with direct S3/blob scanning available for forensic queries.
5. Architecture Diagram
6. Components
| Component | Type | Responsibility | Technology Options | Criticality |
|---|---|---|---|---|
| Audit Log Writer | Application Service | Single write pathway; accepts events from all sources; enforces schema | FastAPI service, gRPC service | Critical |
| PII Sanitisation Pipeline | Data Processing | Named entity recognition; PII token substitution before write | Microsoft Presidio, custom spaCy pipeline | Critical |
| Record Hash & Merkle Chain | Security Control | SHA-256 hash per record; Merkle chain for tamper-evidence | Custom implementation (well-specified) | Critical |
| WORM Blob Store | Compliance Storage | Primary immutable long-term storage; 7-year WORM | AWS S3 Object Lock (Compliance), Azure Immutable Blob, Worm-compliant NAS | Critical |
| Hot Tier Store | Operational Storage | Fast queryable store for recent decisions | PostgreSQL (append-only trigger), OpenSearch | High |
| Search Index | Query Acceleration | Full-text + faceted search for operational queries | OpenSearch, Elasticsearch, Azure Cognitive Search | High |
| Integrity Verifier | Scheduled Job | Daily Merkle chain verification; detects tampering | Custom Python service + AWS Lambda | Critical |
| RBAC Access Gate | Security Control | Enforces role-based access to audit records | API Gateway + OAuth RBAC; OPA policy | Critical |
| Lifecycle Manager | Operations | Manages hot→warm→cold transitions per retention schedule | AWS S3 Lifecycle, Azure Blob lifecycle | Medium |
| Regulatory Export Service | Compliance | Generates structured evidence packages for regulatory submissions | Custom export API with APRA/EU formats | High |
7. Data Flow
Primary Audit Record Write Flow
| Step | Actor | Action | Output |
|---|---|---|---|
| 1 | AI Engine / PEP / Override Gateway | Emits structured decision event to Audit Log Writer | Event payload |
| 2 | Audit Log Writer | Authenticates event source; validates event type | Source-authenticated event |
| 3 | PII Sanitisation Pipeline | Scans input and output fields; replaces PII with entity tokens; preserves cryptographic binding | Sanitised event |
| 4 | Schema Validator | Validates mandatory fields; enforces taxonomy values | Validated event with schema version |
| 5 | Record Hash | Computes SHA-256 of record content; links to previous Merkle chain root | Record hash + chain link |
| 6 | WORM Write | Writes record to hot tier (PostgreSQL/OpenSearch) AND WORM blob simultaneously | Record written with sequence ID |
| 7 | Write Confirmation | Returns success ACK to event source | Durability confirmed |
Regulatory Query Flow
| Step | Actor | Action | Output |
|---|---|---|---|
| 1 | Compliance Officer | Submits regulatory evidence request (customer ID, date range, model) | Query ticket |
| 2 | RBAC Gate | Validates requester has Compliance role; authorises query | Authorised query |
| 3 | Search Index | Executes faceted query over hot + warm tiers | Matching record set |
| 4 | Regulatory Export Service | Formats records per submission format; includes integrity evidence | Evidence package (PDF + JSON + Merkle proof) |
8. Security Considerations
Immutability Enforcement Layers
- Application layer: Audit Log Writer is only write path; no other service has write credentials
- Database layer: PostgreSQL append-only enforced via trigger (no UPDATE/DELETE permitted)
- Storage layer: S3 Object Lock Compliance mode — storage administrators cannot override retention
- Verification layer: Daily Merkle chain verification detects any tampering at any layer
Access Control
- Read access requires Audit Reader role (minimum); Compliance role for full record access; Legal role for original pre-sanitised data (requires court order workflow)
- All reads logged (who read which records, when) — audit of the audit
- No bulk export without specific Compliance Director approval
OWASP LLM Top 10 Mapping
| OWASP LLM Risk | Audit Trail Coverage | Log Field |
|---|---|---|
| LLM01 Prompt Injection | Log policy enforcement decision for injections | policy_decision.injection_detected |
| LLM02 Insecure Output Handling | Log output validator result | output_validation.result |
| LLM06 Sensitive Information Disclosure | Log PII sanitisation applied | pii_sanitisation.entities_redacted |
| LLM08 Excessive Agency | Log action scope vs approved scope | policy_decision.action_scope_check |
| LLM09 Overreliance | Log human override rate | override.occurred, override.actor |
9. Governance Considerations
Retention Policy Governance
Retention periods are set by Legal + Compliance, not by technology teams. Different model use cases may have different retention requirements. The retention policy table is version-controlled and reviewed annually.
Governance Artefacts
| Artefact | Owner | Frequency | Regulatory Linkage |
|---|---|---|---|
| Audit Trail Integrity Report | CISO | Monthly | APRA CPS234 §19 |
| Regulatory Evidence Package | Compliance | Per request | APRA examinations; court orders |
| Retention Policy Compliance Report | Legal | Annually | Privacy Act APP 11; Archives Act |
| Override Activity Report | RAI Officer | Quarterly | EU AI Act Article 14 |
| Decision Volume Report | AI Governance | Monthly | ISO 42001 §9.1 |
10. Operational Considerations
SLOs
| SLO | Target | Measurement |
|---|---|---|
| Write latency p99 | <50ms | Per write event |
| Write availability | 99.99% | 30-day rolling |
| Operational query latency p95 | <5 seconds | Per query |
| Forensic query completion | <24 hours | Per forensic request |
| Integrity verification | Daily completion | Per daily run |
Disaster Recovery
| Scenario | RTO | RPO | Recovery |
|---|---|---|---|
| Hot tier database failure | 15 minutes | 0 (WORM blob is parallel primary) | Rebuild hot tier index from WORM blob |
| Write path unavailable | Circuit breaker: event queue buffers for 15 minutes | 0 | Writes resume from queue on recovery |
| WORM blob region failure | 24 hours (cold restoration) | 0 (replicated) | Cross-region replication pre-configured |
11. Cost Considerations
Cost Drivers
| Driver | Cost Type | 7-Year Cost Estimate |
|---|---|---|
| WORM blob storage | Variable — per GB | At 1TB/year growth: 28TB × $23/TB/mo = AUD $7,700/yr at year 7 |
| Hot tier database | Fixed compute | AUD $5,000–$20,000/yr |
| Search index | Fixed compute | AUD $8,000–$25,000/yr |
| Integrity verifier | Minimal compute | AUD $500/yr |
| PII sanitisation | Compute per event | $0.001–$0.01 per 1,000 events depending on complexity |
Indicative Total Annual Cost
| Scale | Events/Day | Annual Infrastructure | 7-Year Total |
|---|---|---|---|
| Small (100K/day) | 100,000 | AUD $15,000 | AUD $105,000 |
| Medium (1M/day) | 1,000,000 | AUD $45,000 | AUD $315,000 |
| Large (10M/day) | 10,000,000 | AUD $120,000 | AUD $840,000 |
12. Trade-Off Analysis
Option Comparison
| Option | Description | Pros | Cons | Recommended For |
|---|---|---|---|---|
| A: WORM audit trail (this pattern) | Immutable, tiered, cryptographically verified | Regulatory-grade; tamper-evident; 7-year retention | Cost; complexity | All regulated entities |
| B: Standard application logging (ELK) | Mutable logs in Elasticsearch | Simple; developers familiar | Mutable; insufficient retention; not WORM | Development environments only |
| C: Blockchain/DLT audit trail | Decentralised immutable ledger | Strong tamper-evidence | Very high cost; complexity; slow writes; overkill | Niche use cases requiring external verifiability |
| D: SaaS audit trail (Sysdig, Datadog) | Cloud SIEM with long retention | Managed; easy setup | Vendor lock-in; may not meet WORM requirements; data residency concerns | Non-regulated organisations |
Architectural Tensions
| Tension | Stance | Mitigation |
|---|---|---|
| PII retention vs. Audit completeness | PII sanitised at write; cryptographic binding to original | Legal process recovery pathway defined |
| Cost vs. Completeness | Tiered retention; sampled logging for ultra-high volume non-consequential AI | Sampling strategy must be documented and approved |
| Query performance vs. Immutability | Separate queryable hot tier; WORM as primary | Hot tier rebuilt from WORM on failure |
13. Failure Modes
| Failure | Likelihood | Impact | Detection | Recovery |
|---|---|---|---|---|
| Write path failure causing missed records | Low | Critical — regulatory compliance gap | Write queue depth monitoring; ACK timeouts | Event queue with guaranteed delivery; replay from queue |
| PII sanitisation false negative (PII written to audit log) | Medium | High — privacy breach in audit log | Periodic audit of sanitised records; PII scanner on log samples | Re-sanitisation of affected records; Privacy Officer notification |
| Merkle chain gap (tamper indicator) | Very Low | Critical — evidence integrity challenged | Daily integrity verifier | Invoke incident response; preserve evidence; notify CISO |
| Retention policy misconfiguration (early deletion) | Low | Critical — regulatory evidence destroyed | Lifecycle policy monitoring; deletion alerts | Restore from replica; legal hold override for affected records |
14. Regulatory Considerations
APRA CPS230
- §32: Record-keeping obligations for APRA-regulated entities require retention of records related to material operations for 7 years. AI decision records for credit, insurance, superannuation decisions are material operation records.
APRA CPS234
- §19: APRA-regulated entities must retain information security-relevant logs. AI decision logs containing policy enforcement decisions satisfy this obligation.
EU AI Act
- Article 12: Logging capabilities for high-risk AI systems. Providers must ensure high-risk AI systems have automatic logging of events throughout lifetime. This pattern implements Article 12(1) and 12(2) requirements.
- Article 12(4): For AI systems in Annex III categories related to critical infrastructure, public authorities, migration — logs must be kept for period specific to use case. Pattern implements configurable retention per use case.
Privacy Act 1988 / APPs
- APP 11: Reasonable steps to protect personal information. PII sanitisation before writing to long-term audit log is the key control.
- APP 12: Access to personal information. Audit records about an individual are accessible to them on request; search index supports this.
ISO/IEC 42001
- §9.1: Monitoring and measurement of AI management system effectiveness. Audit trail provides the evidence base for effectiveness assessment.
15. Reference Implementations
AWS
| Component | Service |
|---|---|
| WORM Storage | S3 Object Lock (Compliance mode) + Glacier for cold tier |
| Hot Tier | DynamoDB (append-only via condition expressions) |
| Search Index | OpenSearch Service |
| PII Sanitisation | Comprehend (PII detection) + Lambda |
| Integrity Verification | Lambda (scheduled) |
Azure
| Component | Service |
|---|---|
| WORM Storage | Azure Blob Storage (Immutable Blob, compliance lock) |
| Hot Tier | Cosmos DB (append-only via stored procedure) |
| Search Index | Azure Cognitive Search |
| PII Sanitisation | Azure AI Language (PII extraction) |
On-Premises
| Component | Technology |
|---|---|
| WORM Storage | NetApp SnapLock Compliance / EMC DataDomain Retention Lock |
| Hot Tier | PostgreSQL with append-only enforced via trigger |
| Search Index | Elasticsearch |
| PII Sanitisation | Microsoft Presidio (self-hosted) |
16. Related Patterns
| Pattern | Relationship | Dependency Direction |
|---|---|---|
| EAAPL-GOV001 AI Model Register | Input — MRID in every audit record | GOV001 → GOV007 |
| EAAPL-GOV004 AI Policy Enforcement | Input — policy decisions logged | GOV004 → GOV007 |
| EAAPL-GOV005 Responsible AI Framework | Consumer — accountability chain stored here | GOV005 → GOV007 |
| EAAPL-GOV006 Model Bias Detection | Consumer — fairness events stored here | GOV006 → GOV007 |
| EAAPL-GOV008 AI Incident Management | Consumer — forensic queries during incidents | GOV008 → GOV007 |
| EAAPL-CMP001 APRA CPS230 | Satisfies — §32 record-keeping | GOV007 → CMP001 |
| EAAPL-CMP003 EU AI Act | Satisfies — Article 12 logging | GOV007 → CMP003 |
17. Maturity Assessment
Overall Maturity: Mature (Level 4)
| Dimension | Score (1–5) | Evidence |
|---|---|---|
| Immutability architecture | 5 | WORM + Merkle chain + daily verification |
| Schema completeness | 5 | Four-tier schema covering all regulatory requirements |
| PII sanitisation | 4 | NER-based; gap is high-precision sanitisation for novel entity types |
| Retention tiering | 4 | Three tiers defined; gap is automated legal hold override process |
| Query capability | 4 | Operational + forensic query patterns; gap is AI-assisted forensic analysis |
18. Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2024-01-01 | EAAPL Working Group | Initial publication |
| 1.1 | 2024-06-01 | EAAPL Working Group | Added Merkle chain tamper-evidence |
| 1.2 | 2024-12-01 | EAAPL Working Group | EU AI Act Article 12 mapping; retention tiering |
| 2.0 | 2025-08-01 | EAAPL Working Group | Full rewrite: four-tier schema; PII sanitisation architecture; APRA CPS230 §32 alignment |