LLM Input Sanitisation
[EAAPL-SEC005] LLM Input Sanitisation
Category: Security / Data Protection
Sub-category: Pre-Processing Pipeline
Version: 1.2
Maturity: Proven
Tags: pii-detection data-redaction input-validation token-budget schema-validation content-inspection privacy
Regulatory Relevance: Australian Privacy Act 1988, GDPR Art. 25 (Privacy by Design), APRA CPS234, EU AI Act Art. 10, NIST AI RMF MAP 1.5
1. Executive Summary
LLM Input Sanitisation is a pre-processing pipeline that transforms raw application inputs into safe, compliant, and policy-conformant prompts before they reach a large language model. Where the Prompt Firewall (EAAPL-SEC002) focuses on adversarial intent detection, Input Sanitisation focuses on data governance: ensuring that sensitive information, PII, and confidential context does not reach model providers without appropriate controls.
For organisations in regulated industries, the business imperative is clear: sending personally identifiable information, protected health information, or financial account details to a commercial LLM API may constitute an unauthorised disclosure under privacy legislation. Input sanitisation enforces privacy by design at the AI layer — detecting and redacting sensitive data fields before they leave the organisation's security boundary.
Beyond privacy, this pattern provides: token budget enforcement (preventing prompt bloat that drives cost overruns), schema validation (ensuring prompts conform to expected structure, preventing context injection), and malicious content detection (detecting attempts to smuggle harmful content into the model's context through user-provided data). The pattern is deployed as a pipeline stage within the AI Gateway and operates on assembled prompts, after application-level prompt construction but before model provider submission.
2. Problem Statement
Business Problem
Organisations building AI features routinely construct prompts that include user data: customer names, email addresses, account numbers, medical histories, financial transactions. In many implementations, this data flows directly into model API calls without sanitisation. The consequences:
- Privacy breaches when PII is included in prompts sent to commercial model providers whose data handling agreements may not cover the specific use.
- Regulatory violations (Privacy Act, GDPR) if personal information is disclosed to third parties without appropriate consent or contractual protection.
- Confidential business information in prompts potentially available to model provider staff during safety review processes.
- Token waste from verbose, unsanitised prompts inflating costs.
Technical Problem
Application developers building AI features rarely have deep expertise in PII detection or data classification. They construct prompts using string templates that include whatever fields are available in the data object — often including fields that should not be sent to the model. Without a centralised sanitisation layer, each application team must independently implement PII detection, which leads to inconsistent coverage and inevitable gaps.
Additionally, prompts can grow unboundedly through: accumulated conversation history, large document chunks from RAG systems, verbose user inputs, and multiple context injections. Without token budget enforcement, prompts exceed model context windows (causing errors) or consume excessive tokens (driving cost overruns).
Symptoms
- Customer PII appearing in model provider usage logs (discovered during vendor audit).
- Prompts regularly exceeding model context limits, causing application errors.
- Different applications sending different categories of data to models with no central policy.
- No mechanism to audit what data has been sent to model providers.
- Token costs unexpectedly high due to verbose, unsanitised prompts.
Cost of Inaction
| Dimension | Impact |
|---|---|
| Regulatory | Privacy Act / GDPR breach from PII disclosure; potential notification obligation and regulatory fine |
| Reputational | Customer trust erosion if PII disclosure becomes public |
| Financial | Token cost overruns from unsanitised verbose prompts; regulatory fines |
| Security | Confidential business logic, credentials, or trade secrets embedded in prompts |
| Operational | Application errors from context window overflow; no visibility into what data reaches models |
3. Context
When to Apply
- Any AI application that constructs prompts including user-provided data, database records, or document content.
- Applications sending prompts to external (commercial) model provider APIs.
- RAG systems where document chunks are injected into prompts.
- Conversational AI systems accumulating multi-turn context.
- Regulated industries where data handling obligations apply to AI pipelines.
When NOT to Apply
- Fully offline inference where the model is deployed within the organisation's own security boundary and no data leaves.
- AI applications processing only non-sensitive, fully public data with no user context.
- Development/sandbox environments processing synthetic data only.
Prerequisites
| Prerequisite | Detail |
|---|---|
| AI Gateway (EAAPL-SEC001) | Sanitisation pipeline is a stage within the gateway |
| PII Detection Library | Microsoft Presidio, AWS Comprehend PII, or equivalent |
| Data Classification Schema | Organisation's data classification policy codified into detectable entity types |
| Token Counter | Tokeniser for each supported model family (tiktoken for OpenAI, Anthropic tokeniser, etc.) |
Industry Applicability
| Industry | Applicability | Key Driver |
|---|---|---|
| Financial Services | Critical | Account numbers, transaction data, financial advice context |
| Healthcare | Critical | PHI (names, DOB, diagnoses, medications, insurance) — HIPAA/Privacy Act |
| Legal / Professional Services | High | Privileged information; client confidentiality |
| Government | High | Citizen data; classified information controls |
| Retail / E-commerce | High | Customer PII; payment card data |
| HR / Talent Management | High | Employee data; performance reviews; compensation |
4. Architecture Overview
The LLM Input Sanitisation pipeline operates on the fully assembled prompt — after application code has constructed it but before it is forwarded to the model provider. This placement is intentional: sanitisation must occur at a point where the complete prompt context is available (system message + conversation history + user input + RAG context), not in the application before assembly, because PII can appear in any component.
Stage 1: Structural Analysis
The pipeline begins by parsing the prompt structure: identifying system message, user turns, assistant turns, and injected context blocks. This structural awareness is critical because different prompt components warrant different sanitisation policies — system messages may intentionally contain structured data while user inputs should be more aggressively sanitised.
Stage 2: Entity Detection and Classification
The PII detection engine runs structured entity recognition across all prompt components. Detection uses multiple techniques in combination:
- Pattern-based detection: Regular expressions for highly structured PII (credit card numbers, Australian Tax File Numbers, Medicare numbers, phone numbers, email addresses, IP addresses).
- NER-based detection: Named Entity Recognition models (spaCy, Presidio) for names, organisations, addresses, dates.
- Context-aware detection: Recognising that "My name is [X]" followed by a word in a name format context is likely a personal name.
- Custom entity types: Organisation-specific entity types (internal account numbers, employee IDs, proprietary codes) registered in the detection configuration.
Detected entities are classified by type and sensitivity. Not all PII is treated identically — a first name in a customer service context may be acceptable while a full name combined with account number and DOB constitutes high-sensitivity data.
Stage 3: Redaction Strategy
Detected sensitive entities are processed according to the configured redaction strategy per entity type:
- Replacement: Entity replaced with a type label
[PERSON_NAME],[ACCOUNT_NUMBER],[EMAIL_ADDRESS]. The model can still understand the structure of the prompt without the sensitive value. - Pseudonymisation: Entity replaced with a consistent pseudonym (same entity gets the same pseudonym within a session), allowing the model to reason about relationships without knowing the actual value. Pseudonym mapping stored server-side, not in the prompt.
- Hashing: Entity replaced with a short hash for entity correlation without disclosure.
- Removal: Entity removed entirely (for very high-sensitivity fields that add no reasoning value).
Redaction decisions are logged for audit: which entities were detected, which redaction strategy was applied, and a hash of the original value for post-hoc investigation if needed.
Stage 4: Token Budget Enforcement
After PII detection, the sanitised prompt is tokenised and measured against the configured token budget for the request type. If the prompt exceeds the budget:
- Conversation history is truncated (oldest turns removed first) while preserving the system message and current user turn.
- RAG context blocks are truncated (lowest-relevance chunks removed first, if relevance scores are available).
- If the prompt still exceeds budget after truncation, the request is rejected with a clear error to the calling application.
Token budget enforcement protects both cost (over-budget requests waste tokens) and model performance (prompts near the context limit produce degraded outputs).
Stage 5: Schema Validation
The final sanitised prompt is validated against a schema for the specific use case. Schema validation catches context injection attempts: if the application template expects {user_question} but the user has provided content that looks like an additional system instruction, schema validation detects the structural anomaly. This is a lightweight but effective complement to the Prompt Firewall's semantic injection detection.
5. Architecture Diagram
6. Components
| Component | Type | Responsibility | Technology Options | Criticality |
|---|---|---|---|---|
| Structural Parser | Parsing | Identifies prompt components (system/user/assistant/context) for targeted sanitisation | Custom parser, OpenAI message format parser, Anthropic XML tag parser | High |
| PII Detection Engine | NLP | Multi-technique PII and sensitive entity detection | Microsoft Presidio, AWS Comprehend PII, Google DLP, spaCy + custom models | Critical |
| Redaction Engine | Transformation | Applies configured redaction strategy per entity type; maintains pseudonym mapping | Custom transformation layer, Presidio anonymiser, custom in-memory pseudonym store | Critical |
| Token Counter | Measurement | Counts tokens in assembled prompt using model-specific tokeniser | tiktoken (OpenAI), Anthropic tokeniser, HuggingFace tokenisers | High |
| Truncation Engine | Transformation | Removes conversation history and RAG context in priority order to meet token budget | Custom priority truncator (history by age, RAG by relevance score) | High |
| Schema Validator | Security | Validates structural integrity of prompt against registered template schema | JSON Schema validator, Pydantic, custom template validator | Medium |
| Entity Type Registry | Configuration | Catalogue of detectable entity types with detection patterns and custom types | YAML/JSON config, Presidio registry, custom configuration service | Critical |
| Redaction Policy Store | Configuration | Maps entity types to redaction strategies per data classification level | YAML/JSON config, OPA data document | Critical |
| Pseudonym Store | State | Session-scoped mapping of original values to consistent pseudonyms | Redis (session TTL), in-memory map | Medium |
| Sanitisation Audit Log | Compliance | Records all redaction events for audit and investigation | Kafka → immutable log, same pipeline as AI Gateway audit log | Critical |
7. Data Flow
Primary Flow
| Step | Actor | Action | Output |
|---|---|---|---|
| 1 | Application / Gateway | Submits assembled prompt to sanitisation pipeline | Full prompt text with all components |
| 2 | Structural Parser | Identifies prompt components; extracts system message, user turns, RAG context blocks | Tagged prompt structure |
| 3 | PII Detection Engine | Runs pattern matching + NER across all components; identifies entity spans with type and confidence | List of detected entities: (type, span, confidence, component) |
| 4 | Redaction Engine | Applies redaction strategy per entity type; generates consistent pseudonyms if required | Sanitised prompt with entities replaced; audit records of each redaction |
| 5 | Token Counter | Counts tokens in sanitised prompt using model-appropriate tokeniser | Token count |
| 6 | Truncation Engine | If over budget: removes oldest history turns, then lowest-relevance RAG chunks; re-counts tokens | Truncated prompt within budget |
| 7 | Schema Validator | Validates final prompt structure against registered template schema | VALID or INVALID (with violation detail) |
| 8 | Sanitisation Audit Logger | Records: entity types detected, redaction strategies applied, value hashes, token count before/after, truncations | Audit record |
| 9 | AI Gateway | Receives sanitised, schema-valid, on-budget prompt | Forwards to model provider |
Error Flow
| Error | Handling | Status | Alert |
|---|---|---|---|
| Critical PII detected (SSN, passport) and redaction fails | Reject request | 400 | Security: failed redaction of critical entity |
| Token budget exceeded after maximum truncation | Reject request | 400 | Warning: prompt too large even after truncation |
| Schema validation failure (injection indicator) | Reject request | 400 | Security: context injection detected |
| PII detection model unavailable | Fail closed: block request if PII detection is required by policy; or fail-open with alert for non-regulated paths | 503 / degraded | Critical: PII detection unavailable |
8. Security Considerations
Authentication & Authorisation
- Sanitisation pipeline is an internal component; access controlled by AI Gateway (not exposed directly to applications).
- Pseudonym mapping store access restricted to the sanitisation service; no application can retrieve original values from pseudonyms.
Secrets Management
- Commercial PII detection API credentials (if used) managed per EAAPL-SEC008.
- Pseudonym mapping keys encrypted at rest; scoped to session TTL.
Data Classification
- Sanitisation policies are classification-aware: data at higher sensitivity levels triggers more aggressive redaction.
- Prompt classification label is attached after sanitisation (indicating residual sensitivity after redaction).
Encryption
- All pipeline communication in transit over TLS 1.3.
- Sanitisation audit log encrypted at rest; includes entity type and value hash but not original sensitive values.
- Pseudonym store contents encrypted at rest.
OWASP LLM Top 10 Coverage
| OWASP LLM Risk | Input Sanitisation Mitigation | Coverage |
|---|---|---|
| LLM01: Prompt Injection | Schema validation provides structural injection detection; complements SEC002 semantic detection | Medium |
| LLM02: Insecure Output Handling | Prevents PII from entering prompts, reducing PII leakage risk in outputs | High (upstream) |
| LLM03: Training Data Poisoning | Not applicable to inference-time pipeline | None |
| LLM04: Model Denial of Service | Token budget enforcement prevents resource-exhausting over-long prompts | High |
| LLM05: Supply Chain Vulnerabilities | Not directly applicable | None |
| LLM06: Sensitive Information Disclosure | Core purpose: remove PII before prompt reaches model provider | Critical |
| LLM07: Insecure Plugin Design | Not directly applicable | None |
| LLM08: Excessive Agency | Removing PII from context limits agent's ability to act on personal information | Medium |
| LLM09: Overreliance | Not applicable | None |
| LLM10: Model Theft | Pseudonymisation prevents training data extraction by removing identifying information | Medium |
9. Governance Considerations
Responsible AI
- Privacy by design: PII is removed from AI inputs by default, not as an afterthought.
- Redaction decisions must be reviewed for fairness: aggressive redaction that removes cultural names or non-Western name formats may degrade AI quality for certain users disproportionately.
Governance Artefacts
| Artefact | Owner | Frequency | Purpose |
|---|---|---|---|
| PII Detection Coverage Report | Privacy Team | Quarterly | Documents which entity types are detected; coverage gaps |
| Redaction Audit Log | Compliance | Continuous; monthly review | Evidence of PII sanitisation for Privacy Act compliance |
| Token Budget Review | AI Platform | Monthly | Ensures budgets are appropriate; reviews truncation frequency |
| False-Negative Analysis | Privacy + AI Platform | Quarterly | Samples of prompts to verify PII not slipping through detection |
| Entity Registry Update Log | AI Platform | With each update | Records new entity types added; rationale |
10. Operational Considerations
SLOs
| SLO | Target | Measurement |
|---|---|---|
| Sanitisation pipeline latency p99 | <50ms | Pipeline entry → exit span |
| PII detection recall (known entity types) | >98% | Monthly test suite against labelled samples |
| Token budget enforcement accuracy | 100% (no over-budget prompts reach model) | Token count metric on outbound prompts |
| Redaction audit record durability | 100% | Dead-letter queue monitoring |
Incident Management
- PII detected in model provider response (indicating sanitisation miss) → P1: Privacy incident; investigate detection gap; notify privacy team.
- Sanitisation pipeline degraded (PII detection unavailable) → P2 if fail-open; P1 if regulated data pathway.
- Unusual spike in redaction volume → Investigate: may indicate a new data integration sending unexpected PII.
11. Cost Considerations
Cost Drivers
| Cost Driver | Description | Relative Impact |
|---|---|---|
| NER model inference | CPU/GPU compute for PII detection; dominates pipeline cost | High |
| Token counting | Trivial CPU cost | Very Low |
| Pseudonym store | Redis memory; modest at typical session volumes | Low |
| Commercial PII API (if used) | AWS Comprehend, Google DLP per-request pricing | Medium |
Indicative Cost Range
| Scale | Monthly Cost (USD) | Notes |
|---|---|---|
| Small (< 1M requests/day) | $300–$800 | CPU inference (Presidio); local NER model |
| Medium (1M–20M requests/day) | $2,000–$8,000 | GPU inference cluster; Redis cluster |
| Large (> 20M requests/day) | $10,000–$30,000 | Multi-region GPU inference; custom NER fine-tuning |
12. Trade-Off Analysis
Option Comparison
| Option | Description | Pros | Cons | Best For |
|---|---|---|---|---|
| A: Pattern-only detection | Regex-based PII detection (SSN, phone, email patterns) | Very fast; deterministic; zero ML dependencies | Misses unstructured PII (free-text names, addresses); high false-negative rate | Non-regulated applications; fast PoC |
| B: NER-based detection (this pattern) | spaCy/Presidio NER + patterns | High recall for structured and unstructured PII; industry standard | Requires ML model; language-dependent; some false positives | Regulated applications; production privacy-by-design |
| C: Cloud-native DLP | AWS Comprehend, Google DLP, Azure Purview | Managed; continuously updated; low operational overhead | Sends prompt content to cloud (data residency risk); per-request cost; limited customisation | Cloud-committed organisations; non-sensitive baseline |
| D: LLM-based PII detection | Use a smaller LLM to detect PII in the input prompt | Flexible; handles complex context | Adds significant latency (LLM call before LLM call); cost; introduces recursive risk | Research; specialised high-accuracy requirements |
Architectural Tensions
| Tension | Trade-Off |
|---|---|
| Recall vs Latency | Higher-accuracy NER models (larger, slower) detect more PII but add more latency. Resolution: use distilled NER models (spaCy sm) for high-throughput paths; full models for sensitive data pathways. |
| Redaction vs Utility | Aggressive redaction reduces PII risk but may reduce the model's ability to provide useful responses (e.g., replacing a customer's name makes personalisation impossible). Resolution: pseudonymisation preserves reasoning utility while removing identifying values. |
| Centralisation vs Application Context | A shared sanitisation pipeline lacks knowledge of what PII is intentional vs accidental in a specific application's context. Resolution: per-application redaction profiles that can whitelist certain entity types for specific use cases. |
13. Failure Modes
| Failure | Likelihood | Impact | Detection | Recovery |
|---|---|---|---|---|
| NER model false negative (misses PII) | Medium | High (PII reaches model provider) | Post-hoc audit of sampled prompts; model output PII detection | Update entity detection patterns; retrain NER model |
| Pseudonym store full (Redis OOM) | Low | Medium (pseudonymisation falls back to replacement) | Redis memory metrics | Evict oldest sessions; scale Redis memory |
| Token budget too tight (excessive truncation) | Medium | Medium (degraded AI output quality) | Truncation frequency metric; quality regression | Review and increase token budgets; improve prompt efficiency |
| Pipeline latency spike (NER model overloaded) | Medium | High (AI Gateway SLO breach) | Pipeline latency metric | Autoscale NER inference; horizontal scaling |
| Schema false positive (blocks legitimate prompt) | Low | Medium (user-facing error) | 400 error rate from schema validation | Tune schema; add to schema allow list |
14. Regulatory Considerations
| Regulation | Requirement | Implementation |
|---|---|---|
| Australian Privacy Act 1988 — APP 11 | Take reasonable steps to protect personal information | Automated PII detection and redaction before third-party model provider submission |
| GDPR Art. 25 (Privacy by Design) | Implement appropriate technical measures to implement data protection principles | PII detection and pseudonymisation pipeline implements technical data protection by design |
| GDPR Art. 28 (Processor obligations) | Data processor must implement appropriate security measures | Model provider is a processor; sanitisation limits personal data shared with processor |
| EU AI Act Art. 10 (Data Governance) | Training and input data must meet quality criteria; data governance practices | Sanitisation pipeline implements input data governance |
| HIPAA Technical Safeguards | Technical safeguards to protect PHI in electronic transmissions | Automatic PHI detection and redaction before external model API calls |
| APRA CPS234 §21 | Information security controls for third-party dependencies | Sanitisation limits sensitive data exposure to model provider third parties |
15. Reference Implementations
AWS
| Component | AWS Service / OSS |
|---|---|
| PII detection | Amazon Comprehend PII + custom entity types, or Presidio on ECS |
| Redaction engine | Custom Lambda + Presidio anonymiser |
| Token counting | tiktoken (Lambda layer) |
| Pseudonym store | ElastiCache Redis |
| Audit logging | Kinesis Firehose → S3 (Object Lock) |
Azure
| Component | Azure Service / OSS |
|---|---|
| PII detection | Azure AI Language PII detection + Presidio |
| Redaction | Custom Azure Function |
| Token counting | Custom tiktoken deployment |
| Pseudonym store | Azure Cache for Redis |
| Audit logging | Event Hub → Immutable Blob Storage |
On-Premises
| Component | Technology |
|---|---|
| PII detection | Microsoft Presidio (self-hosted) + spaCy models |
| Redaction engine | Presidio anonymiser with custom operators |
| Token counting | tiktoken + HuggingFace tokenisers |
| Pseudonym store | Redis Cluster |
| Audit logging | Kafka → Elasticsearch |
16. Related Patterns
| Pattern | ID | Relationship |
|---|---|---|
| AI Gateway | EAAPL-SEC001 | SEC005 is a pipeline stage within the gateway |
| Prompt Firewall | EAAPL-SEC002 | Complementary: SEC002 detects adversarial intent; SEC005 handles data governance |
| AI Output Filtering | EAAPL-SEC006 | Defence pair: SEC005 prevents PII entering; SEC006 detects PII leaking in outputs |
| AI Data Classification | EAAPL-SEC009 | Classification labels from SEC009 inform SEC005 redaction policy selection |
| Zero-Trust AI Pipeline | EAAPL-SEC007 | SEC005 implements the data-governance stage of the zero-trust pipeline |
17. Maturity Assessment
Overall Maturity: Proven
| Dimension | Score (1–5) | Rationale |
|---|---|---|
| Pattern definition clarity | 5 | Well-defined stages and clear privacy objective |
| Technology availability | 5 | Microsoft Presidio, AWS Comprehend, Google DLP are all production-ready |
| Industry adoption | 4 | Widely adopted in financial services and healthcare AI deployments |
| NER model quality | 4 | Strong for English; multilingual support requires additional configuration |
| Regulatory alignment | 5 | Directly addresses Privacy Act, GDPR, and HIPAA requirements |
| Operational tooling | 4 | Presidio provides strong operational foundation; custom entity types require engineering |
18. Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2024-02-20 | Security Architecture Team | Initial pattern definition |
| 1.1 | 2024-06-10 | Security Architecture Team | Added pseudonymisation strategy; token budget enforcement detail |
| 1.2 | 2024-12-01 | Security Architecture Team | Updated regulatory mapping; added Australian Privacy Act specific guidance; expanded failure modes |