Proven

Collaborative AI Decision

Pattern ID: EAAPL-HIL004 Status: Proven Tags: human-oversight accountability traceability medium-complexity Version: 1.0 Last Updated: 2026-06-12

1. Executive Summary

The Collaborative AI Decision pattern defines an architecture in which humans and AI jointly make decisions — the AI provides a recommendation with structured reasoning, and a human exercises independent judgment to accept, modify, or reject it. This is architecturally distinct from escalation (AI hands off entirely) and from automation (AI decides alone). The collaborative model is appropriate for decisions that require AI-scale information processing combined with human accountability, contextual judgment, and ethical responsibility.

The pattern covers: recommendation presentation designed to inform without anchoring; mandatory override tracking with reason taxonomy; long-run outcome feedback to validate whether AI or human judgment proved more accurate; over-reliance detection to prevent automation bias from becoming a silent quality failure; and a complete audit trail showing human versus AI contribution to every decision. CIOs and CTOs implementing this pattern can satisfy regulatory requirements for human accountability in AI-assisted decisions, create an empirical dataset for continuous model improvement, and demonstrate that AI augments rather than replaces human judgment — a critical positioning for regulated industries, model risk governance, and board-level AI governance frameworks.

2. Problem Statement

Business Problem

Organisations deploying AI recommendations in high-stakes decisions (credit, insurance underwriting, clinical care, legal review) face two symmetric risks: not using AI at all (leaving accuracy gains on the table) and using AI without accountability (creating liability and regulatory exposure). The decision-maker needs AI leverage without ceding human accountability. Most implementations fail in both directions — either displaying a "yes/no" with no reasoning (insufficient to support human judgment) or burying the human in an AI-generated report they cannot meaningfully engage with (cognitive overload leading to rubber-stamping).

Technical Problem

A well-intentioned AI recommendation system can silently transition into automation if human override rates approach zero. Without override tracking, automation bias is undetectable. Without outcome feedback, it is impossible to know whether human overrides or AI recommendations are more accurate over time. Without structured reasoning presentation, the human cannot make an informed judgment — they can only agree or disagree with a number.

Symptoms

Decision-makers report feeling they "always agree with the AI" without being able to articulate why
Override rate has never been measured or is consistently below 3%
When AI recommendations prove incorrect, post-mortem reveals the human approved the recommendation without meaningful review
Audit logs show AI recommendation and human decision but not the reasoning behind either
Outcome data (was the AI recommendation correct?) is not tracked, so model improvement is based on inputs not outcomes

Cost of Inaction

Regulatory liability: if AI recommendation causes harm and no evidence of independent human judgment exists, the human approval is legally void
Silent quality degradation: automation bias means human approval rate does not reflect decision quality; errors compound at scale
Model improvement opportunity missed: without outcome tracking, the most valuable feedback signal (was the AI right?) is never captured
Organisational complacency: "AI checks it" becomes cultural cover for reduced human diligence

3. Context

When to Apply

High-stakes decisions where human accountability is legally or ethically required
Decisions with non-trivial error rates that benefit from human judgment on specific cases
Regulated decisions: credit, underwriting, clinical, hiring, sentencing, benefits assessment
Decisions where explainability is required for compliance or customer communication
Environments building a ground-truth dataset for model improvement from decision outcomes

When NOT to Apply

High-volume, low-stakes, fully reversible decisions where human review cannot scale (content moderation at millions per day)
Decisions where human judgment adds no value (well-defined algorithmic domains with clear ground truth)
Time-critical decisions where human latency is architecturally incompatible (fraud detection at transaction time — use post-hoc override instead)

Prerequisites

AI system produces a structured recommendation with confidence score and reasoning
Human decision-makers available and trained on the review interface
Outcome data exists or can be collected to evaluate decision accuracy over time
Override tracking and audit infrastructure in place or buildable

Industry Applicability

Industry	Decision Type	AI Contribution	Human Contribution
Financial Services	Credit application assessment	Risk score + payment behaviour analysis + policy compliance check	Final credit decision + limit setting + exception authority
Insurance	Underwriting	Risk classification + actuarial scoring + fraud signals	Coverage terms + exception handling + risk acceptance
Healthcare	Treatment recommendation	Diagnosis probability + clinical guideline matching + drug interaction check	Clinical judgment + patient context + ethical decision
Legal	Contract risk assessment	Clause classification + risk flagging + precedent matching	Legal advice + negotiation authority + professional responsibility
Government	Benefits eligibility	Policy rule application + documentation completeness + fraud scoring	Eligibility determination + hardship consideration + appeal handling
Human Resources	Candidate assessment	Resume analysis + skills matching + behavioural signal processing	Hiring decision + team fit + culture judgment

4. Architecture Overview

The Collaborative AI Decision pattern operates as a structured review workflow with six integrated capabilities.

Capability 1 — Recommendation Presentation. The AI generates a structured recommendation containing: a primary recommendation (approve / decline / refer / flag for review, depending on the domain); a calibrated confidence score expressed as a percentage with a plain-language interpretation ("This recommendation is based on high-confidence pattern matching; the model is correct in approximately 92% of similar cases"); the top three supporting reasons for the recommendation, each with a source citation (data field, policy rule, or retrieved document); two or three alternative outcomes the AI considered and why they were ranked lower; and a risk flag if the case has characteristics associated with higher error rates in historical data. Critically, the interface presents these elements in a designed sequence that encourages independent review before displaying the primary recommendation — the human should formulate their initial assessment from the reasoning before seeing the AI's conclusion. This reduces anchoring bias without hiding the AI recommendation.

Capability 2 — Override Tracking. Every human decision is captured as either AGREE (human accepts AI recommendation without modification), AGREE_WITH_MODIFICATION (human accepts recommendation but changes a parameter — e.g. credit limit reduced), or OVERRIDE (human rejects AI recommendation entirely). For any outcome other than AGREE, the human must select an override reason from a structured taxonomy and optionally provide free text. The override taxonomy covers: wrong_facts (AI used incorrect data), wrong_reasoning (AI reasoning was flawed but facts were correct), policy_violation (AI recommendation contravenes policy not captured in model), inappropriate_context (AI recommendation technically correct but wrong for this specific case), user_preference (decision-maker has specific knowledge not available to AI), and other. Override reasons are mandatory — a submit button is not available until a reason is selected. This requires no more than 30 seconds.

Capability 3 — Outcome Feedback. For each decision (whether AI-accepted or overridden), the system tracks the downstream outcome when it becomes available. For a credit decision, the outcome is whether the customer defaulted. For an insurance decision, whether a claim was filed. For a clinical recommendation, whether the treatment was effective. This outcome data is stored against the original decision record, enabling a retrospective accuracy comparison: across all AI-accepted decisions, what was the outcome rate? Across all human-override decisions, what was the outcome rate? This is the most powerful model improvement signal available.

Capability 4 — Over-Reliance Detection. The system monitors the human override rate at the individual decision-maker level and at the aggregate level. If any individual's override rate drops below 5% of AI recommendations over a rolling 30-day window, the system flags this as a potential automation bias concern and notifies their supervisor. If the aggregate override rate drops below 3%, this triggers a formal Model Risk review to assess whether the collaborative model has de facto become fully automated. The threshold is configurable by domain. Note: a low override rate may also indicate the model is genuinely excellent — outcome data is used to distinguish between automation bias and appropriate acceptance.

Capability 5 — Audit Trail. Every decision generates an immutable audit record containing: the input data snapshot at decision time (not a reference to the current data state — a snapshot, to prevent post-hoc data modification); the AI recommendation with full reasoning; the human decision with override code if applicable; the free-text reasoning; the identity of the decision-maker; and the timestamp. This audit trail is the primary artefact for regulatory examination and for resolving customer disputes.

Capability 6 — Cognitive Load Design. The review interface presents information in a layered structure: the minimum necessary information is on the primary view; deeper data is accessible through drill-down. The primary view shows the AI recommendation, confidence, top three reasons, and the decision controls. Full supporting evidence is one click away. Decision-makers can annotate the AI reasoning ("I agree with reason 1, not reason 3") which is captured and fed back to model improvement. Time-on-review is optionally tracked to identify cases where decision-makers spend very little time (potential rubber-stamping) or very long time (potential interface complexity issue).

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Presentation["AI Recommendation"] A[Case Submitted] B[AI Recommendation + Reasoning] C[Review Interface] end subgraph Decision["Human Decision"] D{Agree, Modify, or Override} E[(Immutable Audit Record)] end subgraph Feedback["Outcome Feedback"] F[Decision Executed] G[Outcome Monitor] H[Override Rate Monitor] end A --> B B --> C C --> D D -->|any outcome| E E --> F F --> G G --> H H -->|override rate low| E style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f3e8ff,stroke:#a855f7 style E fill:#fef9c3,stroke:#eab308 style F fill:#d1fae5,stroke:#10b981 style G fill:#f0fdf4,stroke:#22c55e style H fill:#fee2e2,stroke:#ef4444

6. Components

Component	Type	Responsibility	Technology Options	Criticality
AI Inference Engine	ML Serving	Generate recommendation with confidence and structured reasoning	SageMaker, Vertex AI, Azure ML, OpenAI API with structured output	Critical
Recommendation Formatter	Application Service	Structure AI output into presentation format (top-3 reasons, alternatives, confidence interpretation)	Python microservice; prompt engineering for LLM-based reasoning extraction	High
Review Interface	Web Application	Present layered recommendation to decision-maker; capture decision + reason	Custom React application; Salesforce Lightning; ServiceNow custom form	Critical
Input Data Snapshot Service	Data Service	Capture point-in-time snapshot of all input data used for recommendation	PostgreSQL JSON snapshot column; event sourcing store	Critical
Decision Store	Data Store	Persist all decisions with full audit record	PostgreSQL with append-only audit table	Critical
Outcome Monitor	Background Service	Track downstream outcomes for decided cases; link back to decision record	Batch job on scheduled cadence; event-driven if outcome is a system event	High
Override Rate Monitor	Analytics Service	Compute individual and aggregate override rates; fire alerts	Python analytics job; BI tool (Tableau, Looker) for visualisation	High
Outcome Accuracy Comparator	Analytics Service	Compare accuracy rates for AI-accepted vs human-override decisions	Python analytics job with statistical significance testing	High
Model Improvement Pipeline	ML Pipeline	Ingest outcome-labelled decisions as training data	SageMaker Pipelines, Vertex AI Pipelines, Kubeflow	Medium

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	Case Management System	Submits case for collaborative review	case_id, input_data{}, requestor_id
2	AI Inference Engine	Generates recommendation and reasoning	recommendation, confidence, reasons[], alternatives[], risk_flags[]
3	Recommendation Formatter	Structures output into layered presentation format	formatted_recommendation{primary, confidence_text, top3_reasons, alternatives, drill_down_url}
4	Input Data Snapshot Service	Captures point-in-time snapshot of all input fields	snapshot_id, input_data_snapshot{} with timestamp
5	Review Interface	Presents to decision-maker; enforces review sequence	decision_maker_id, time_entered_interface
6	Decision-Maker	Reviews reasoning; makes decision; selects override reason if applicable	decision_code, override_reason_code, override_text, time_on_review_ms
7	Decision Store	Writes immutable audit record	audit_record_id with full snapshot, decision, identity, timestamp
8	Downstream Systems	Execute decision outcome	Decision actioned (credit approved, claim referred, etc.)
9	Outcome Monitor	Detects downstream outcome event; links to original decision	outcome_type, outcome_timestamp, outcome_value
10	Override Rate Monitor	Recomputes individual and aggregate override rates	override_rate_report; alert if threshold breached
11	Outcome Accuracy Comparator	Compares AI-accepted vs override decision accuracy	accuracy_comparison_report; statistical significance test result
12	Model Improvement Pipeline	Ingests outcome-labelled decisions	Updated training dataset; triggers retraining if volume threshold met

Error Flow

Error Condition	Detected By	Recovery Action	Notification
AI inference fails (recommendation unavailable)	Review Interface	Present case without AI recommendation; human decides independently; flag case as AI-unavailable	Decision-maker notified; logged for AI operations review
Override reason not selected (UI bypass attempt)	API validation layer	Reject decision submission; return validation error	Decision-maker redirected to provide reason
Outcome data unavailable for outcome monitoring	Outcome Monitor	Flag decision as outcome-pending after 90 days; exclude from accuracy comparison until outcome received	ML Ops notified to investigate outcome data source
Input data snapshot service unavailable	Snapshot Service	Block decision submission until snapshot service restored; do not allow decision without snapshot (audit integrity)	Operations on-call paged

8. Security Considerations

Authentication and Authorisation

Review interface requires SSO + MFA; sessions expire after 60 minutes of inactivity
RBAC: decision authority tiers determine which case types each decision-maker may handle
Audit records are read-only for decision-makers; write access is restricted to the decision submission API
Model improvement pipeline reads decision and outcome data under a service account with read-only access to audit store

Secrets Management

AI inference API credentials stored in secrets manager; rotated every 90 days
Decision store connection credentials stored in secrets manager; never in application configuration files

Data Classification

Input data snapshots inherit the sensitivity of the highest-classification field in the case
Audit records containing sensitive case data stored in encrypted, access-controlled store
Override reason text may contain PII (decision-makers may describe customer-specific context); treat as sensitive

Encryption

All audit records encrypted at rest (AES-256)
All data in transit encrypted (TLS 1.3)
Audit record store uses database-level encryption with key managed by enterprise key management service

Auditability

Audit records are append-only; no update or delete capability on production record
All access to audit records logged with accessor identity and timestamp
Archive audit records to WORM storage after 90 days; retain for 7 years minimum in regulated industries

OWASP LLM Top 10 Considerations

OWASP LLM Risk	Applicability	Mitigation
LLM01: Prompt Injection	Medium — case input data shown to decision-maker may contain adversarial text	Sanitise display rendering; if AI reasoning generation uses input data in prompt, sanitise input before injection
LLM02: Insecure Output Handling	Medium — AI reasoning text is displayed in review interface	Sanitise AI reasoning output before HTML rendering; escape user-controlled content
LLM03: Training Data Poisoning	Medium — outcome-labelled decisions feed model training	Validate outcome data provenance; anomaly detection on outcome label distribution
LLM04: Model Denial of Service	Low	Standard rate limiting on inference endpoint
LLM05: Supply Chain Vulnerabilities	Medium — third-party LLM for reasoning generation	Model provenance tracking; approved model provider list; output quality monitoring
LLM06: Sensitive Information Disclosure	High — AI reasoning may leak sensitive data from other cases if model memorised training data	Monitor for PII patterns in AI reasoning outputs; test base model for memorisation of training data
LLM07: Insecure Plugin Design	Low — not applicable to this pattern	N/A
LLM08: Excessive Agency	Low — AI makes no autonomous decisions in this pattern	By design
LLM09: Overreliance	Critical — the primary risk of this pattern is automation bias	Override rate monitoring; supervisor alerts; Model Risk review are the core mitigations
LLM10: Model Theft	Medium — AI reasoning reveals model's decision logic	Access controls on review interface; rate limiting; no bulk export of AI reasoning without authorisation

9. Governance Considerations

Responsible AI

Override rate monitored by protected group characteristics: if decision-makers override AI at significantly different rates for different demographic groups, investigate for discriminatory patterns
Outcome accuracy monitored by protected group: if AI recommendations have higher error rates for specific groups, trigger bias investigation and remediation
Automation bias intervention (override rate monitoring) is a named governance control in the AI Governance Framework

Model Risk Management

Outcome accuracy comparison report is reviewed quarterly by Model Risk Officer
If human overrides are consistently more accurate than AI acceptances, this indicates systematic AI error and triggers mandatory model review
If human overrides are consistently less accurate than AI acceptances, this indicates over-riding is not adding value — review decision-maker training and interface design, not model performance

Human Approval Gates

Changes to confidence thresholds or recommendation presentation require Model Risk sign-off
Addition of new case types to the collaborative decision scope requires Legal and Compliance review

Policy Compliance

Decision-maker authority levels must be defined in policy and reflected in the RBAC configuration
Audit records must be available for regulatory examination within 5 business days of request

Traceability

Every decision is traceable from: case input → AI recommendation + reasoning → human decision + reason → downstream action → outcome
Full trace available for regulatory inspection, customer dispute resolution, and model improvement

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Override Rate Report	Model Risk	Monthly	Track individual and aggregate override rates; detect automation bias
Outcome Accuracy Report	Model Risk	Quarterly	Compare AI vs human accuracy using outcome data
Automation Bias Investigation Records	Model Risk Officer	As triggered	Document investigation and resolution for each automation bias flag
Decision Audit Log	Compliance	Continuous, reviewed annually	Immutable record for regulatory examination
Fairness Assessment	Model Risk / Ethics Board	Quarterly	Protected group analysis of recommendation accuracy and override rates

10. Operational Considerations

Monitoring

Metric	SLO	Alert Threshold	Owner
Review interface latency (time to display recommendation)	< 3 seconds	> 8 seconds	Engineering
Decision submission success rate	> 99.9%	< 99.5% for any 1-hour window	Engineering
AI recommendation availability	> 99.5%	< 99%	ML Ops
Individual override rate	> 5% per rolling 30 days	< 5% for any decision-maker	Supervisor / Model Risk
Aggregate override rate	> 3%	< 3% for aggregate	Model Risk
Outcome data ingestion lag	< 24 hours from outcome event	> 72 hours	ML Ops
Audit record write success rate	100%	Any failure	Operations on-call

Logging

Structured JSON logs for all decision events keyed by case_id, decision_maker_id, timestamp
Audit records stored in append-only table; separate from application logs
Time-on-review metric logged per decision to support automation bias investigation

Incident Response

Automation bias flag: supervisor notified within 24 hours; investigation completed within 5 business days; outcome documented
AI inference outage: decision-makers notified; fall back to human-only mode with flag on audit record; restore AI within SLO
Audit store unavailability: halt decision submissions until audit store restored (audit integrity is non-negotiable)

Disaster Recovery

Component	RTO	RPO	Strategy
AI Inference Engine	15 min	0 (stateless)	Multi-AZ; auto-scaling
Decision Store (Audit)	30 min	5 min	PostgreSQL synchronous standby; WAL archiving; WORM archive after 90d
Review Interface	30 min	N/A (stateless)	Multi-AZ deployment; CDN for static assets
Outcome Monitor	4 hours	1 hour	Batch job; re-runnable; idempotent

Capacity Planning

Review interface must support peak concurrent decision-maker sessions: size horizontally
Decision store grows permanently (records never deleted); plan storage growth at 2–5 KB per record × daily volume × 7 years retention

11. Cost Considerations

Cost Drivers

Driver	Description	Relative Weight
Decision-Maker Labour	Human review time × volume; dominant cost	Very High
AI Inference	Per-call cost × volume; LLM-based reasoning generation is more expensive	Medium-High
Audit Storage	Grows permanently; managed by partitioning and tiered storage	Medium
Review Interface Development	Custom development if not using commercial platform	Medium (one-time)
Outcome Monitoring Infrastructure	Batch jobs for outcome collection and accuracy comparison	Low

Scaling Risks

Decision-maker labour scales linearly with case volume; AI accuracy improvement reduces labour cost per unit
LLM-based reasoning generation at high volume can become significant cost; optimise by caching reasoning for identical input patterns

Optimisations

Batch low-priority decisions: not all collaborative decisions need real-time review; batch P3 cases for efficient human processing
Calibrate confidence thresholds to route only genuinely uncertain cases to collaborative review; route high-confidence routine cases to approval with reduced review burden (abbreviated review mode)
Use outcome data to identify case patterns where AI is consistently correct: these may be candidates for increased automation with reduced human review

Indicative Cost Range

Scale	Daily Decisions	Decision-Maker Labour	AI Inference	Total Monthly
Small (500/day)	500	$25,000–$60,000/month	$500–$2,000/month	$25,500–$62,000/month
Medium (5K/day)	5,000	$150,000–$400,000/month	$5,000–$20,000/month	$155,000–$420,000/month
Large (50K/day)	50,000	$800,000–$2M/month	$30,000–$100,000/month	$830,000–$2.1M/month

12. Trade-Off Analysis

Presentation Strategy Options

Strategy	Automation Bias Risk	Human Judgment Quality	Decision Latency	Recommended
Show recommendation first, then reasoning	High — anchoring effect; humans agree with the first number they see	Low — reasoning reviewed to justify pre-formed conclusion	Low	Not recommended for high-stakes decisions
Show reasoning first, then recommendation	Low — human forms independent assessment before seeing AI conclusion	High — reasoning informs independent judgment	Medium — slight increase	Recommended for regulated, high-stakes decisions
Show reasoning only (hide recommendation until human makes initial assessment)	Very Low	Very High	High — most time-intensive	Use for highest-stakes decisions: credit appeals, clinical escalation, benefits review
Abbreviate reasoning for experienced decision-makers	Medium	Medium	Very Low	Use for low-stakes collaborative decisions; never for regulated decisions

Architectural Tensions

Tension	Option A	Option B	Resolution Guidance
Override reason granularity vs compliance burden	Fine-grained taxonomy (10+ codes): richer data for model improvement	Simple taxonomy (3 codes): faster for decision-makers	Use 5–7 codes: enough granularity for actionable model improvement; few enough to not burden reviewers
Outcome tracking completeness vs data availability	Track outcomes for all decisions	Track outcomes only where data is readily available	Track all decisions but allow outcome_pending state; do not exclude from accuracy analysis just because outcome is delayed
Automation bias intervention vs false positives	Strict threshold (override rate < 10% triggers alert)	Lenient threshold (< 2% triggers alert)	Calibrate per domain: new decision-makers will have higher override rates; experienced decision-makers with excellent models may legitimately have lower rates; supplement with outcome accuracy check before intervention

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Automation bias becomes pervasive (override rate collapses)	High without monitoring	Critical — legal liability; model errors unchecked	Override rate monitoring	Supervisor intervention; interface redesign; decision-maker retraining
AI reasoning quality degrades (reasons become generic)	Medium	High — decision-makers cannot make informed judgments	Time-on-review drops; override text quality drops	Inference quality review; prompt or model retraining
Outcome data unavailable for accuracy comparison	Medium	High — no signal for model improvement or bias detection	Outcome monitor flags unlinked decisions > 90 days	Investigation of outcome data source; manual outcome collection for sample
Decision store write failure (audit gap)	Low	Critical — regulatory exposure; decisions not auditable	Audit write success rate monitoring	Block further decisions until store restored; retroactive reconstruction from application logs
Override reason taxonomy becomes inadequate	Low	Medium — decision-makers use "other" at high rate, reducing signal quality	Other code usage rate > 20%	Taxonomy review and expansion; "other" text analysis to identify new categories
Decision-maker collusion (gaming override reason codes)	Very Low	High — audit trail becomes unreliable	Statistical anomaly detection on override patterns	Forensic audit; access control review; outcome accuracy investigation

Cascading Failure Scenario

AI reasoning quality degrades → decision-maker time-on-review drops → override rate falls below 3% → automation bias not detected because outcome monitor is lagged by 90 days → AI errors propagate at scale for 3 months
Mitigation: Override rate monitoring operates independently of outcome data (catches the signal early); outcome accuracy monitoring provides trailing confirmation

14. Regulatory Considerations

Regulation	Specific Clause	Requirement	Implementation
EU AI Act	Article 14(4) — Human oversight requirements	Decision-makers must be able to understand AI system outputs and override them	Review interface provides full reasoning; override is always available; override is the explicit design mechanism
EU AI Act	Article 13 — Transparency and provision of information	AI system must provide sufficient transparency to enable effective human oversight	Top-3 reasons + alternatives + confidence interpretation constitute transparency evidence
EU AI Act	Article 9 — Risk management system	Risks including automation bias must be identified and managed	Override rate monitoring is the automation bias risk management control
APRA CPS 230	§50 — Board oversight of operational risk	AI-assisted decisions are operational risk events; governance must be demonstrable	Override rate report and outcome accuracy report satisfy board governance evidence requirements
Privacy Act 1988 (Australia)	APP 1 — Open and transparent management	Individuals must be able to understand how decisions about them are made	AI recommendation reasoning + human override reason are both accessible on request
ISO 42001:2023	§8.4 — AI system accountability	Accountability for AI-influenced decisions must be assigned to named humans	Audit trail names decision-maker identity for every decision; accountability is unambiguous
NIST AI RMF	MAP 5.2 — Human involvement in high-risk decisions	Human judgment must be meaningfully involved, not performative	Over-reliance monitoring + override tracking demonstrate genuine human involvement
NIST AI RMF	MEASURE 2.5 — Bias and fairness monitoring	AI-assisted decisions must be monitored for differential impacts	Protected group analysis of override rates and outcome accuracy
Basel III (financial services)	SR 11-7 — Model Risk Management	Models used in credit decisions require human review and override capability	Collaborative decision pattern with audit trail is the prescribed SR 11-7 control structure

15. Reference Implementations

AWS

AI Inference: SageMaker Real-time Endpoints with LLM reasoning generation via Bedrock (Claude 3.5 Sonnet)
Review Interface: Custom React app on Amplify with Cognito authentication
Decision Store: Amazon RDS PostgreSQL with append-only audit table; Aurora WORM via RDS S3 export after 90 days
Override Rate Monitor: Lambda function scheduled via EventBridge; CloudWatch metrics dashboard
Outcome Monitor: Step Functions workflow triggered by downstream system events
Model Improvement Pipeline: SageMaker Pipelines ingesting from RDS read replica

Azure

AI Inference: Azure Machine Learning Endpoints + Azure OpenAI for reasoning generation
Review Interface: Power Apps or custom React on Azure Static Web Apps with Azure AD authentication
Decision Store: Azure SQL Database with row-level security; Azure Immutable Blob Storage for archive
Override Rate Monitor: Azure Functions + Azure Monitor alerts
Outcome Monitor: Azure Logic Apps triggered by Dynamics 365 events

GCP

AI Inference: Vertex AI Online Prediction + Vertex AI Gemini for reasoning generation
Review Interface: Custom app on Cloud Run with Firebase Authentication
Decision Store: Cloud SQL PostgreSQL; BigQuery for analytics workloads
Override Rate Monitor: Cloud Scheduler + Cloud Functions; Looker Studio dashboard
Outcome Monitor: Cloud Dataflow pipeline reading from Pub/Sub outcome events

On-Premises / Private Cloud

AI Inference: TorchServe or vLLM on Kubernetes
Review Interface: React app on Kubernetes with LDAP/AD authentication
Decision Store: PostgreSQL with pgaudit extension for append-only audit
Override Rate Monitor: Python analytics job on Airflow; Grafana dashboard
Outcome Monitor: Airflow DAG pulling from operational data warehouse

Pattern	ID	Relationship	Notes
Human Escalation Pattern	EAAPL-HIL003	Complementary — escalation is when AI hands off entirely; collaborative is when AI and human share the decision	Use escalation for cases requiring human primary; collaborative for cases where AI augments human
Human Override Pattern	EAAPL-HIL006	Specialisation — override is the mechanics of human rejection; collaborative is the full architecture including override + feedback	Override pattern is embedded in the collaborative decision architecture
Active Learning Loop	EAAPL-HIL002	Complementary — outcome data from collaborative decisions is premium training signal	Outcome-labelled decisions feed active learning training store
AI Confidence Threshold Routing	EAAPL-HIL005	Dependency — threshold routing determines which cases enter collaborative review	High-confidence cases may bypass collaborative review; threshold pattern governs the boundary
Annotation and Feedback Loop	EAAPL-HIL007	Overlapping — collaborative decision generates human judgments that can be treated as annotations	Override decisions with reasons are annotation-quality training data
Human-in-the-Loop Agent	EAAPL-MAG003	Complementary — agent checkpoints trigger collaborative review for agent recommendations	Collaborative decision pattern is instantiated at each high-stakes agent checkpoint

17. Maturity Assessment

Overall Maturity Level: Proven

Dimension	Score (1–5)	Rationale
Technical Maturity	4	Core components (ML inference, audit DB, web interface) are mature; structured reasoning presentation and automation bias monitoring are less standardised
Operational Maturity	4	Decision review workflows are well-understood; outcome tracking integration requires domain-specific engineering
Governance Maturity	5	EU AI Act Article 14, APRA model risk, and SR 11-7 directly require the capabilities this pattern delivers
Tooling Ecosystem	3	No purpose-built "collaborative AI decision" platforms; most implementations are custom; Salesforce Einstein and similar tools provide partial capability
Enterprise Adoption	4	Widely used in financial services (credit, underwriting); growing in healthcare and legal; less mature in government
Risk Profile	Medium	Primary risk is automation bias becoming pervasive; well-controlled with override rate monitoring

18. Revision History

Version	Date	Author	Changes
1.0	2026-06-12	EAAPL Working Group	Initial publication covering recommendation presentation, override tracking, outcome feedback, over-reliance detection, audit trail, and cognitive load design

← Back to Library More Human-in-the-Loop →

Collaborative AI Decision

Collaborative AI Decision

1. Executive Summary

2. Problem Statement

Business Problem

Technical Problem

Symptoms

Cost of Inaction

3. Context

When to Apply

When NOT to Apply

Prerequisites

Industry Applicability

4. Architecture Overview

5. Architecture Diagram

6. Components

7. Data Flow

Primary Flow

Error Flow

8. Security Considerations

Authentication and Authorisation

Secrets Management

Data Classification

Encryption

Auditability

OWASP LLM Top 10 Considerations

9. Governance Considerations

Responsible AI

Model Risk Management

Human Approval Gates

Policy Compliance

Traceability

Governance Artefacts

10. Operational Considerations

Monitoring

Logging

Incident Response

Disaster Recovery

Capacity Planning

11. Cost Considerations

Cost Drivers

Scaling Risks

Optimisations

Indicative Cost Range

12. Trade-Off Analysis

Presentation Strategy Options

Architectural Tensions

13. Failure Modes

Cascading Failure Scenario

14. Regulatory Considerations

15. Reference Implementations

AWS

Azure

GCP

On-Premises / Private Cloud

16. Related Patterns

17. Maturity Assessment

18. Revision History