EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryHuman-in-the-Loop
Proven
⇄ Compare

Collaborative AI Decision

Collaborative AI Decision

Pattern ID: EAAPL-HIL004 Status: Proven Tags: human-oversight accountability traceability medium-complexity Version: 1.0 Last Updated: 2026-06-12


1. Executive Summary

The Collaborative AI Decision pattern defines an architecture in which humans and AI jointly make decisions — the AI provides a recommendation with structured reasoning, and a human exercises independent judgment to accept, modify, or reject it. This is architecturally distinct from escalation (AI hands off entirely) and from automation (AI decides alone). The collaborative model is appropriate for decisions that require AI-scale information processing combined with human accountability, contextual judgment, and ethical responsibility.

The pattern covers: recommendation presentation designed to inform without anchoring; mandatory override tracking with reason taxonomy; long-run outcome feedback to validate whether AI or human judgment proved more accurate; over-reliance detection to prevent automation bias from becoming a silent quality failure; and a complete audit trail showing human versus AI contribution to every decision. CIOs and CTOs implementing this pattern can satisfy regulatory requirements for human accountability in AI-assisted decisions, create an empirical dataset for continuous model improvement, and demonstrate that AI augments rather than replaces human judgment — a critical positioning for regulated industries, model risk governance, and board-level AI governance frameworks.


2. Problem Statement

Business Problem

Organisations deploying AI recommendations in high-stakes decisions (credit, insurance underwriting, clinical care, legal review) face two symmetric risks: not using AI at all (leaving accuracy gains on the table) and using AI without accountability (creating liability and regulatory exposure). The decision-maker needs AI leverage without ceding human accountability. Most implementations fail in both directions — either displaying a "yes/no" with no reasoning (insufficient to support human judgment) or burying the human in an AI-generated report they cannot meaningfully engage with (cognitive overload leading to rubber-stamping).

Technical Problem

A well-intentioned AI recommendation system can silently transition into automation if human override rates approach zero. Without override tracking, automation bias is undetectable. Without outcome feedback, it is impossible to know whether human overrides or AI recommendations are more accurate over time. Without structured reasoning presentation, the human cannot make an informed judgment — they can only agree or disagree with a number.

Symptoms

  • Decision-makers report feeling they "always agree with the AI" without being able to articulate why
  • Override rate has never been measured or is consistently below 3%
  • When AI recommendations prove incorrect, post-mortem reveals the human approved the recommendation without meaningful review
  • Audit logs show AI recommendation and human decision but not the reasoning behind either
  • Outcome data (was the AI recommendation correct?) is not tracked, so model improvement is based on inputs not outcomes

Cost of Inaction

  • Regulatory liability: if AI recommendation causes harm and no evidence of independent human judgment exists, the human approval is legally void
  • Silent quality degradation: automation bias means human approval rate does not reflect decision quality; errors compound at scale
  • Model improvement opportunity missed: without outcome tracking, the most valuable feedback signal (was the AI right?) is never captured
  • Organisational complacency: "AI checks it" becomes cultural cover for reduced human diligence

3. Context

When to Apply

  • High-stakes decisions where human accountability is legally or ethically required
  • Decisions with non-trivial error rates that benefit from human judgment on specific cases
  • Regulated decisions: credit, underwriting, clinical, hiring, sentencing, benefits assessment
  • Decisions where explainability is required for compliance or customer communication
  • Environments building a ground-truth dataset for model improvement from decision outcomes

When NOT to Apply

  • High-volume, low-stakes, fully reversible decisions where human review cannot scale (content moderation at millions per day)
  • Decisions where human judgment adds no value (well-defined algorithmic domains with clear ground truth)
  • Time-critical decisions where human latency is architecturally incompatible (fraud detection at transaction time — use post-hoc override instead)

Prerequisites

  • AI system produces a structured recommendation with confidence score and reasoning
  • Human decision-makers available and trained on the review interface
  • Outcome data exists or can be collected to evaluate decision accuracy over time
  • Override tracking and audit infrastructure in place or buildable

Industry Applicability

Industry Decision Type AI Contribution Human Contribution
Financial Services Credit application assessment Risk score + payment behaviour analysis + policy compliance check Final credit decision + limit setting + exception authority
Insurance Underwriting Risk classification + actuarial scoring + fraud signals Coverage terms + exception handling + risk acceptance
Healthcare Treatment recommendation Diagnosis probability + clinical guideline matching + drug interaction check Clinical judgment + patient context + ethical decision
Legal Contract risk assessment Clause classification + risk flagging + precedent matching Legal advice + negotiation authority + professional responsibility
Government Benefits eligibility Policy rule application + documentation completeness + fraud scoring Eligibility determination + hardship consideration + appeal handling
Human Resources Candidate assessment Resume analysis + skills matching + behavioural signal processing Hiring decision + team fit + culture judgment

4. Architecture Overview

The Collaborative AI Decision pattern operates as a structured review workflow with six integrated capabilities.

Capability 1 — Recommendation Presentation. The AI generates a structured recommendation containing: a primary recommendation (approve / decline / refer / flag for review, depending on the domain); a calibrated confidence score expressed as a percentage with a plain-language interpretation ("This recommendation is based on high-confidence pattern matching; the model is correct in approximately 92% of similar cases"); the top three supporting reasons for the recommendation, each with a source citation (data field, policy rule, or retrieved document); two or three alternative outcomes the AI considered and why they were ranked lower; and a risk flag if the case has characteristics associated with higher error rates in historical data. Critically, the interface presents these elements in a designed sequence that encourages independent review before displaying the primary recommendation — the human should formulate their initial assessment from the reasoning before seeing the AI's conclusion. This reduces anchoring bias without hiding the AI recommendation.

Capability 2 — Override Tracking. Every human decision is captured as either AGREE (human accepts AI recommendation without modification), AGREE_WITH_MODIFICATION (human accepts recommendation but changes a parameter — e.g. credit limit reduced), or OVERRIDE (human rejects AI recommendation entirely). For any outcome other than AGREE, the human must select an override reason from a structured taxonomy and optionally provide free text. The override taxonomy covers: wrong_facts (AI used incorrect data), wrong_reasoning (AI reasoning was flawed but facts were correct), policy_violation (AI recommendation contravenes policy not captured in model), inappropriate_context (AI recommendation technically correct but wrong for this specific case), user_preference (decision-maker has specific knowledge not available to AI), and other. Override reasons are mandatory — a submit button is not available until a reason is selected. This requires no more than 30 seconds.

Capability 3 — Outcome Feedback. For each decision (whether AI-accepted or overridden), the system tracks the downstream outcome when it becomes available. For a credit decision, the outcome is whether the customer defaulted. For an insurance decision, whether a claim was filed. For a clinical recommendation, whether the treatment was effective. This outcome data is stored against the original decision record, enabling a retrospective accuracy comparison: across all AI-accepted decisions, what was the outcome rate? Across all human-override decisions, what was the outcome rate? This is the most powerful model improvement signal available.

Capability 4 — Over-Reliance Detection. The system monitors the human override rate at the individual decision-maker level and at the aggregate level. If any individual's override rate drops below 5% of AI recommendations over a rolling 30-day window, the system flags this as a potential automation bias concern and notifies their supervisor. If the aggregate override rate drops below 3%, this triggers a formal Model Risk review to assess whether the collaborative model has de facto become fully automated. The threshold is configurable by domain. Note: a low override rate may also indicate the model is genuinely excellent — outcome data is used to distinguish between automation bias and appropriate acceptance.

Capability 5 — Audit Trail. Every decision generates an immutable audit record containing: the input data snapshot at decision time (not a reference to the current data state — a snapshot, to prevent post-hoc data modification); the AI recommendation with full reasoning; the human decision with override code if applicable; the free-text reasoning; the identity of the decision-maker; and the timestamp. This audit trail is the primary artefact for regulatory examination and for resolving customer disputes.

Capability 6 — Cognitive Load Design. The review interface presents information in a layered structure: the minimum necessary information is on the primary view; deeper data is accessible through drill-down. The primary view shows the AI recommendation, confidence, top three reasons, and the decision controls. Full supporting evidence is one click away. Decision-makers can annotate the AI reasoning ("I agree with reason 1, not reason 3") which is captured and fed back to model improvement. Time-on-review is optionally tracked to identify cases where decision-makers spend very little time (potential rubber-stamping) or very long time (potential interface complexity issue).


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Presentation["AI Recommendation"] A[Case Submitted] B[AI Recommendation + Reasoning] C[Review Interface] end subgraph Decision["Human Decision"] D{Agree, Modify, or Override} E[(Immutable Audit Record)] end subgraph Feedback["Outcome Feedback"] F[Decision Executed] G[Outcome Monitor] H[Override Rate Monitor] end A --> B B --> C C --> D D -->|any outcome| E E --> F F --> G G --> H H -->|override rate low| E style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f3e8ff,stroke:#a855f7 style E fill:#fef9c3,stroke:#eab308 style F fill:#d1fae5,stroke:#10b981 style G fill:#f0fdf4,stroke:#22c55e style H fill:#fee2e2,stroke:#ef4444

6. Components

Component Type Responsibility Technology Options Criticality
AI Inference Engine ML Serving Generate recommendation with confidence and structured reasoning SageMaker, Vertex AI, Azure ML, OpenAI API with structured output Critical
Recommendation Formatter Application Service Structure AI output into presentation format (top-3 reasons, alternatives, confidence interpretation) Python microservice; prompt engineering for LLM-based reasoning extraction High
Review Interface Web Application Present layered recommendation to decision-maker; capture decision + reason Custom React application; Salesforce Lightning; ServiceNow custom form Critical
Input Data Snapshot Service Data Service Capture point-in-time snapshot of all input data used for recommendation PostgreSQL JSON snapshot column; event sourcing store Critical
Decision Store Data Store Persist all decisions with full audit record PostgreSQL with append-only audit table Critical
Outcome Monitor Background Service Track downstream outcomes for decided cases; link back to decision record Batch job on scheduled cadence; event-driven if outcome is a system event High
Override Rate Monitor Analytics Service Compute individual and aggregate override rates; fire alerts Python analytics job; BI tool (Tableau, Looker) for visualisation High
Outcome Accuracy Comparator Analytics Service Compare accuracy rates for AI-accepted vs human-override decisions Python analytics job with statistical significance testing High
Model Improvement Pipeline ML Pipeline Ingest outcome-labelled decisions as training data SageMaker Pipelines, Vertex AI Pipelines, Kubeflow Medium

7. Data Flow

Primary Flow

Step Actor Action Output
1 Case Management System Submits case for collaborative review case_id, input_data{}, requestor_id
2 AI Inference Engine Generates recommendation and reasoning recommendation, confidence, reasons[], alternatives[], risk_flags[]
3 Recommendation Formatter Structures output into layered presentation format formatted_recommendation{primary, confidence_text, top3_reasons, alternatives, drill_down_url}
4 Input Data Snapshot Service Captures point-in-time snapshot of all input fields snapshot_id, input_data_snapshot{} with timestamp
5 Review Interface Presents to decision-maker; enforces review sequence decision_maker_id, time_entered_interface
6 Decision-Maker Reviews reasoning; makes decision; selects override reason if applicable decision_code, override_reason_code, override_text, time_on_review_ms
7 Decision Store Writes immutable audit record audit_record_id with full snapshot, decision, identity, timestamp
8 Downstream Systems Execute decision outcome Decision actioned (credit approved, claim referred, etc.)
9 Outcome Monitor Detects downstream outcome event; links to original decision outcome_type, outcome_timestamp, outcome_value
10 Override Rate Monitor Recomputes individual and aggregate override rates override_rate_report; alert if threshold breached
11 Outcome Accuracy Comparator Compares AI-accepted vs override decision accuracy accuracy_comparison_report; statistical significance test result
12 Model Improvement Pipeline Ingests outcome-labelled decisions Updated training dataset; triggers retraining if volume threshold met

Error Flow

Error Condition Detected By Recovery Action Notification
AI inference fails (recommendation unavailable) Review Interface Present case without AI recommendation; human decides independently; flag case as AI-unavailable Decision-maker notified; logged for AI operations review
Override reason not selected (UI bypass attempt) API validation layer Reject decision submission; return validation error Decision-maker redirected to provide reason
Outcome data unavailable for outcome monitoring Outcome Monitor Flag decision as outcome-pending after 90 days; exclude from accuracy comparison until outcome received ML Ops notified to investigate outcome data source
Input data snapshot service unavailable Snapshot Service Block decision submission until snapshot service restored; do not allow decision without snapshot (audit integrity) Operations on-call paged

8. Security Considerations

Authentication and Authorisation

  • Review interface requires SSO + MFA; sessions expire after 60 minutes of inactivity
  • RBAC: decision authority tiers determine which case types each decision-maker may handle
  • Audit records are read-only for decision-makers; write access is restricted to the decision submission API
  • Model improvement pipeline reads decision and outcome data under a service account with read-only access to audit store

Secrets Management

  • AI inference API credentials stored in secrets manager; rotated every 90 days
  • Decision store connection credentials stored in secrets manager; never in application configuration files

Data Classification

  • Input data snapshots inherit the sensitivity of the highest-classification field in the case
  • Audit records containing sensitive case data stored in encrypted, access-controlled store
  • Override reason text may contain PII (decision-makers may describe customer-specific context); treat as sensitive

Encryption

  • All audit records encrypted at rest (AES-256)
  • All data in transit encrypted (TLS 1.3)
  • Audit record store uses database-level encryption with key managed by enterprise key management service

Auditability

  • Audit records are append-only; no update or delete capability on production record
  • All access to audit records logged with accessor identity and timestamp
  • Archive audit records to WORM storage after 90 days; retain for 7 years minimum in regulated industries

OWASP LLM Top 10 Considerations

OWASP LLM Risk Applicability Mitigation
LLM01: Prompt Injection Medium — case input data shown to decision-maker may contain adversarial text Sanitise display rendering; if AI reasoning generation uses input data in prompt, sanitise input before injection
LLM02: Insecure Output Handling Medium — AI reasoning text is displayed in review interface Sanitise AI reasoning output before HTML rendering; escape user-controlled content
LLM03: Training Data Poisoning Medium — outcome-labelled decisions feed model training Validate outcome data provenance; anomaly detection on outcome label distribution
LLM04: Model Denial of Service Low Standard rate limiting on inference endpoint
LLM05: Supply Chain Vulnerabilities Medium — third-party LLM for reasoning generation Model provenance tracking; approved model provider list; output quality monitoring
LLM06: Sensitive Information Disclosure High — AI reasoning may leak sensitive data from other cases if model memorised training data Monitor for PII patterns in AI reasoning outputs; test base model for memorisation of training data
LLM07: Insecure Plugin Design Low — not applicable to this pattern N/A
LLM08: Excessive Agency Low — AI makes no autonomous decisions in this pattern By design
LLM09: Overreliance Critical — the primary risk of this pattern is automation bias Override rate monitoring; supervisor alerts; Model Risk review are the core mitigations
LLM10: Model Theft Medium — AI reasoning reveals model's decision logic Access controls on review interface; rate limiting; no bulk export of AI reasoning without authorisation

9. Governance Considerations

Responsible AI

  • Override rate monitored by protected group characteristics: if decision-makers override AI at significantly different rates for different demographic groups, investigate for discriminatory patterns
  • Outcome accuracy monitored by protected group: if AI recommendations have higher error rates for specific groups, trigger bias investigation and remediation
  • Automation bias intervention (override rate monitoring) is a named governance control in the AI Governance Framework

Model Risk Management

  • Outcome accuracy comparison report is reviewed quarterly by Model Risk Officer
  • If human overrides are consistently more accurate than AI acceptances, this indicates systematic AI error and triggers mandatory model review
  • If human overrides are consistently less accurate than AI acceptances, this indicates over-riding is not adding value — review decision-maker training and interface design, not model performance

Human Approval Gates

  • Changes to confidence thresholds or recommendation presentation require Model Risk sign-off
  • Addition of new case types to the collaborative decision scope requires Legal and Compliance review

Policy Compliance

  • Decision-maker authority levels must be defined in policy and reflected in the RBAC configuration
  • Audit records must be available for regulatory examination within 5 business days of request

Traceability

  • Every decision is traceable from: case input → AI recommendation + reasoning → human decision + reason → downstream action → outcome
  • Full trace available for regulatory inspection, customer dispute resolution, and model improvement

Governance Artefacts

Artefact Owner Frequency Purpose
Override Rate Report Model Risk Monthly Track individual and aggregate override rates; detect automation bias
Outcome Accuracy Report Model Risk Quarterly Compare AI vs human accuracy using outcome data
Automation Bias Investigation Records Model Risk Officer As triggered Document investigation and resolution for each automation bias flag
Decision Audit Log Compliance Continuous, reviewed annually Immutable record for regulatory examination
Fairness Assessment Model Risk / Ethics Board Quarterly Protected group analysis of recommendation accuracy and override rates

10. Operational Considerations

Monitoring

Metric SLO Alert Threshold Owner
Review interface latency (time to display recommendation) < 3 seconds > 8 seconds Engineering
Decision submission success rate > 99.9% < 99.5% for any 1-hour window Engineering
AI recommendation availability > 99.5% < 99% ML Ops
Individual override rate > 5% per rolling 30 days < 5% for any decision-maker Supervisor / Model Risk
Aggregate override rate > 3% < 3% for aggregate Model Risk
Outcome data ingestion lag < 24 hours from outcome event > 72 hours ML Ops
Audit record write success rate 100% Any failure Operations on-call

Logging

  • Structured JSON logs for all decision events keyed by case_id, decision_maker_id, timestamp
  • Audit records stored in append-only table; separate from application logs
  • Time-on-review metric logged per decision to support automation bias investigation

Incident Response

  • Automation bias flag: supervisor notified within 24 hours; investigation completed within 5 business days; outcome documented
  • AI inference outage: decision-makers notified; fall back to human-only mode with flag on audit record; restore AI within SLO
  • Audit store unavailability: halt decision submissions until audit store restored (audit integrity is non-negotiable)

Disaster Recovery

Component RTO RPO Strategy
AI Inference Engine 15 min 0 (stateless) Multi-AZ; auto-scaling
Decision Store (Audit) 30 min 5 min PostgreSQL synchronous standby; WAL archiving; WORM archive after 90d
Review Interface 30 min N/A (stateless) Multi-AZ deployment; CDN for static assets
Outcome Monitor 4 hours 1 hour Batch job; re-runnable; idempotent

Capacity Planning

  • Review interface must support peak concurrent decision-maker sessions: size horizontally
  • Decision store grows permanently (records never deleted); plan storage growth at 2–5 KB per record × daily volume × 7 years retention

11. Cost Considerations

Cost Drivers

Driver Description Relative Weight
Decision-Maker Labour Human review time × volume; dominant cost Very High
AI Inference Per-call cost × volume; LLM-based reasoning generation is more expensive Medium-High
Audit Storage Grows permanently; managed by partitioning and tiered storage Medium
Review Interface Development Custom development if not using commercial platform Medium (one-time)
Outcome Monitoring Infrastructure Batch jobs for outcome collection and accuracy comparison Low

Scaling Risks

  • Decision-maker labour scales linearly with case volume; AI accuracy improvement reduces labour cost per unit
  • LLM-based reasoning generation at high volume can become significant cost; optimise by caching reasoning for identical input patterns

Optimisations

  • Batch low-priority decisions: not all collaborative decisions need real-time review; batch P3 cases for efficient human processing
  • Calibrate confidence thresholds to route only genuinely uncertain cases to collaborative review; route high-confidence routine cases to approval with reduced review burden (abbreviated review mode)
  • Use outcome data to identify case patterns where AI is consistently correct: these may be candidates for increased automation with reduced human review

Indicative Cost Range

Scale Daily Decisions Decision-Maker Labour AI Inference Total Monthly
Small (500/day) 500 $25,000–$60,000/month $500–$2,000/month $25,500–$62,000/month
Medium (5K/day) 5,000 $150,000–$400,000/month $5,000–$20,000/month $155,000–$420,000/month
Large (50K/day) 50,000 $800,000–$2M/month $30,000–$100,000/month $830,000–$2.1M/month

12. Trade-Off Analysis

Presentation Strategy Options

Strategy Automation Bias Risk Human Judgment Quality Decision Latency Recommended
Show recommendation first, then reasoning High — anchoring effect; humans agree with the first number they see Low — reasoning reviewed to justify pre-formed conclusion Low Not recommended for high-stakes decisions
Show reasoning first, then recommendation Low — human forms independent assessment before seeing AI conclusion High — reasoning informs independent judgment Medium — slight increase Recommended for regulated, high-stakes decisions
Show reasoning only (hide recommendation until human makes initial assessment) Very Low Very High High — most time-intensive Use for highest-stakes decisions: credit appeals, clinical escalation, benefits review
Abbreviate reasoning for experienced decision-makers Medium Medium Very Low Use for low-stakes collaborative decisions; never for regulated decisions

Architectural Tensions

Tension Option A Option B Resolution Guidance
Override reason granularity vs compliance burden Fine-grained taxonomy (10+ codes): richer data for model improvement Simple taxonomy (3 codes): faster for decision-makers Use 5–7 codes: enough granularity for actionable model improvement; few enough to not burden reviewers
Outcome tracking completeness vs data availability Track outcomes for all decisions Track outcomes only where data is readily available Track all decisions but allow outcome_pending state; do not exclude from accuracy analysis just because outcome is delayed
Automation bias intervention vs false positives Strict threshold (override rate < 10% triggers alert) Lenient threshold (< 2% triggers alert) Calibrate per domain: new decision-makers will have higher override rates; experienced decision-makers with excellent models may legitimately have lower rates; supplement with outcome accuracy check before intervention

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Automation bias becomes pervasive (override rate collapses) High without monitoring Critical — legal liability; model errors unchecked Override rate monitoring Supervisor intervention; interface redesign; decision-maker retraining
AI reasoning quality degrades (reasons become generic) Medium High — decision-makers cannot make informed judgments Time-on-review drops; override text quality drops Inference quality review; prompt or model retraining
Outcome data unavailable for accuracy comparison Medium High — no signal for model improvement or bias detection Outcome monitor flags unlinked decisions > 90 days Investigation of outcome data source; manual outcome collection for sample
Decision store write failure (audit gap) Low Critical — regulatory exposure; decisions not auditable Audit write success rate monitoring Block further decisions until store restored; retroactive reconstruction from application logs
Override reason taxonomy becomes inadequate Low Medium — decision-makers use "other" at high rate, reducing signal quality Other code usage rate > 20% Taxonomy review and expansion; "other" text analysis to identify new categories
Decision-maker collusion (gaming override reason codes) Very Low High — audit trail becomes unreliable Statistical anomaly detection on override patterns Forensic audit; access control review; outcome accuracy investigation

Cascading Failure Scenario

  • AI reasoning quality degrades → decision-maker time-on-review drops → override rate falls below 3% → automation bias not detected because outcome monitor is lagged by 90 days → AI errors propagate at scale for 3 months
  • Mitigation: Override rate monitoring operates independently of outcome data (catches the signal early); outcome accuracy monitoring provides trailing confirmation

14. Regulatory Considerations

Regulation Specific Clause Requirement Implementation
EU AI Act Article 14(4) — Human oversight requirements Decision-makers must be able to understand AI system outputs and override them Review interface provides full reasoning; override is always available; override is the explicit design mechanism
EU AI Act Article 13 — Transparency and provision of information AI system must provide sufficient transparency to enable effective human oversight Top-3 reasons + alternatives + confidence interpretation constitute transparency evidence
EU AI Act Article 9 — Risk management system Risks including automation bias must be identified and managed Override rate monitoring is the automation bias risk management control
APRA CPS 230 §50 — Board oversight of operational risk AI-assisted decisions are operational risk events; governance must be demonstrable Override rate report and outcome accuracy report satisfy board governance evidence requirements
Privacy Act 1988 (Australia) APP 1 — Open and transparent management Individuals must be able to understand how decisions about them are made AI recommendation reasoning + human override reason are both accessible on request
ISO 42001:2023 §8.4 — AI system accountability Accountability for AI-influenced decisions must be assigned to named humans Audit trail names decision-maker identity for every decision; accountability is unambiguous
NIST AI RMF MAP 5.2 — Human involvement in high-risk decisions Human judgment must be meaningfully involved, not performative Over-reliance monitoring + override tracking demonstrate genuine human involvement
NIST AI RMF MEASURE 2.5 — Bias and fairness monitoring AI-assisted decisions must be monitored for differential impacts Protected group analysis of override rates and outcome accuracy
Basel III (financial services) SR 11-7 — Model Risk Management Models used in credit decisions require human review and override capability Collaborative decision pattern with audit trail is the prescribed SR 11-7 control structure

15. Reference Implementations

AWS

  • AI Inference: SageMaker Real-time Endpoints with LLM reasoning generation via Bedrock (Claude 3.5 Sonnet)
  • Review Interface: Custom React app on Amplify with Cognito authentication
  • Decision Store: Amazon RDS PostgreSQL with append-only audit table; Aurora WORM via RDS S3 export after 90 days
  • Override Rate Monitor: Lambda function scheduled via EventBridge; CloudWatch metrics dashboard
  • Outcome Monitor: Step Functions workflow triggered by downstream system events
  • Model Improvement Pipeline: SageMaker Pipelines ingesting from RDS read replica

Azure

  • AI Inference: Azure Machine Learning Endpoints + Azure OpenAI for reasoning generation
  • Review Interface: Power Apps or custom React on Azure Static Web Apps with Azure AD authentication
  • Decision Store: Azure SQL Database with row-level security; Azure Immutable Blob Storage for archive
  • Override Rate Monitor: Azure Functions + Azure Monitor alerts
  • Outcome Monitor: Azure Logic Apps triggered by Dynamics 365 events

GCP

  • AI Inference: Vertex AI Online Prediction + Vertex AI Gemini for reasoning generation
  • Review Interface: Custom app on Cloud Run with Firebase Authentication
  • Decision Store: Cloud SQL PostgreSQL; BigQuery for analytics workloads
  • Override Rate Monitor: Cloud Scheduler + Cloud Functions; Looker Studio dashboard
  • Outcome Monitor: Cloud Dataflow pipeline reading from Pub/Sub outcome events

On-Premises / Private Cloud

  • AI Inference: TorchServe or vLLM on Kubernetes
  • Review Interface: React app on Kubernetes with LDAP/AD authentication
  • Decision Store: PostgreSQL with pgaudit extension for append-only audit
  • Override Rate Monitor: Python analytics job on Airflow; Grafana dashboard
  • Outcome Monitor: Airflow DAG pulling from operational data warehouse

Pattern ID Relationship Notes
Human Escalation Pattern EAAPL-HIL003 Complementary — escalation is when AI hands off entirely; collaborative is when AI and human share the decision Use escalation for cases requiring human primary; collaborative for cases where AI augments human
Human Override Pattern EAAPL-HIL006 Specialisation — override is the mechanics of human rejection; collaborative is the full architecture including override + feedback Override pattern is embedded in the collaborative decision architecture
Active Learning Loop EAAPL-HIL002 Complementary — outcome data from collaborative decisions is premium training signal Outcome-labelled decisions feed active learning training store
AI Confidence Threshold Routing EAAPL-HIL005 Dependency — threshold routing determines which cases enter collaborative review High-confidence cases may bypass collaborative review; threshold pattern governs the boundary
Annotation and Feedback Loop EAAPL-HIL007 Overlapping — collaborative decision generates human judgments that can be treated as annotations Override decisions with reasons are annotation-quality training data
Human-in-the-Loop Agent EAAPL-MAG003 Complementary — agent checkpoints trigger collaborative review for agent recommendations Collaborative decision pattern is instantiated at each high-stakes agent checkpoint

17. Maturity Assessment

Overall Maturity Level: Proven

Dimension Score (1–5) Rationale
Technical Maturity 4 Core components (ML inference, audit DB, web interface) are mature; structured reasoning presentation and automation bias monitoring are less standardised
Operational Maturity 4 Decision review workflows are well-understood; outcome tracking integration requires domain-specific engineering
Governance Maturity 5 EU AI Act Article 14, APRA model risk, and SR 11-7 directly require the capabilities this pattern delivers
Tooling Ecosystem 3 No purpose-built "collaborative AI decision" platforms; most implementations are custom; Salesforce Einstein and similar tools provide partial capability
Enterprise Adoption 4 Widely used in financial services (credit, underwriting); growing in healthcare and legal; less mature in government
Risk Profile Medium Primary risk is automation bias becoming pervasive; well-controlled with override rate monitoring

18. Revision History

Version Date Author Changes
1.0 2026-06-12 EAAPL Working Group Initial publication covering recommendation presentation, override tracking, outcome feedback, over-reliance detection, audit trail, and cognitive load design
← Back to LibraryMore Human-in-the-Loop