EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryHuman-in-the-Loop
Proven
⇄ Compare

Hybrid Intelligence Pattern

Hybrid Intelligence Pattern

Pattern ID: EAAPL-HIL008 Status: Proven Tags: human-oversight explainability accountability high-complexity Version: 1.0 Last Updated: 2026-06-12


1. Executive Summary

The Hybrid Intelligence Pattern defines an architecture for systematically allocating each component of a complex task to the agent — human or AI — that can perform it best. Rather than treating AI as an optional add-on to human workflows or humans as a fallback for AI failures, it decomposes the task into sub-components, classifies each by suitability criteria, and designs explicit handoff protocols that transfer context between agents with minimal friction and maximal fidelity.

This pattern addresses the most common failure mode in enterprise AI deployments: asking a single agent (usually AI alone or human-plus-AI suggestions in a sidebar) to perform all components of a task, ignoring that AI excels at pattern-matched high-volume processing while humans excel at novel, ethical, and ambiguous reasoning. The pattern covers task decomposition methodology; interface design for human-AI collaboration; handoff protocols; cognitive load management; trust calibration to prevent both automation bias and underuse; and performance measurement that compares hybrid intelligence against human-only and AI-only baselines. CIOs and CTOs gain a framework for continuous optimisation of human-AI task allocation, delivering measurable quality and efficiency gains that neither AI nor humans can achieve independently.


2. Problem Statement

Business Problem

Enterprises deploying AI into complex knowledge work face a paradox: AI is too often given the entire task (producing poor quality on the hard parts) or too little of the task (producing limited efficiency gains). The optimal allocation — AI takes what it does well, humans take the rest — is rarely designed deliberately. It emerges ad hoc, inconsistently across teams, and without any mechanism for measurement or improvement.

Technical Problem

Task decomposition requires identifying sub-task types and classifying them by AI suitability criteria. Without this classification, engineers default to giving AI the whole input and presenting the whole output to humans for review — which does not actually leverage AI's strengths or protect against its weaknesses. The handoff between AI and human sub-tasks is an engineering problem that is frequently underspecified: what context transfers, in what format, at what latency, and with what guarantees of completeness?

Symptoms

  • AI is used for whole-task processing but humans override a high percentage of AI outputs
  • Human reviewers cannot articulate which parts of the AI output they are checking vs rubber-stamping
  • Task completion time with AI assistance is not significantly faster than without it, despite AI being deployed
  • No measurement exists of whether hybrid performance exceeds human-only or AI-only performance
  • Human trust in AI varies wildly across team members: some always accept AI outputs; others always override

Cost of Inaction

  • Sub-optimal allocation leaves efficiency gains unrealised and quality gains uncaptured
  • Automation bias (over-trusting AI) and underuse bias (over-riding AI) coexist in the same team, producing inconsistent quality
  • Without performance measurement, it is impossible to demonstrate AI ROI or to improve the allocation over time

3. Context

When to Apply

  • Complex knowledge work tasks with multiple identifiable sub-components
  • Domains where some sub-tasks are AI-suitable (high volume, pattern-matched) and others are human-suitable (novel, ethical, ambiguous)
  • Regulated environments where accountability for specific decision components must be attributable to a named human
  • Teams mature enough to run performance measurement and iterate on human-AI allocation

When NOT to Apply

  • Simple uniform tasks where decomposition does not reveal meaningfully different sub-components
  • Latency-critical tasks where the handoff protocol overhead is architecturally prohibitive
  • Organisations that lack the operational maturity to manage the handoff protocol and performance measurement

Prerequisites

  • Complex task with identifiable, separable sub-components
  • AI system capable of processing at least some sub-components reliably
  • Human expert workforce available for high-suitability-human sub-components
  • Performance measurement infrastructure (quality metrics, timing, outcome tracking)

Industry Applicability

Industry Complex Task AI-Suitable Sub-Tasks Human-Suitable Sub-Tasks
Legal Contract review Clause identification, standard risk flagging, precedent matching Novel risk assessment, negotiation recommendation, client advice
Healthcare Clinical documentation ICD coding of standard diagnoses, medication reconciliation Complex comorbidity assessment, patient communication, ethical decisions
Financial Services Credit assessment Document data extraction, fraud signal scoring, policy compliance check Final credit judgment, exception handling, relationship context
Insurance Claims processing Document classification, damage estimation from photos, fraud scoring Coverage interpretation disputes, large loss adjustment, litigation management
HR Candidate assessment Resume parsing, skills matching, compliance screening Interview quality assessment, culture fit judgment, hiring decision
Research Literature review Paper retrieval, citation extraction, structured data extraction Synthesis, hypothesis generation, expert interpretation

4. Architecture Overview

The Hybrid Intelligence Pattern requires implementing six capabilities in integrated sequence.

Capability 1 — Task Decomposition Framework. The first step is to decompose the target task into sub-components and classify each by suitability using four criteria: volume and repetition (AI-suitable if the sub-task type recurs at high volume with consistent patterns); variance and novelty (human-suitable if novel cases are common and require contextual judgment); ethical and accountability requirements (always human if the sub-task produces a judgment requiring personal accountability — e.g. "should we approve this patient's request for surgery?"); and error tolerance (human-required if errors have high consequence and are hard to reverse). Each sub-task is classified on a 2×2 matrix: AI confidence (can the AI perform this reliably?) versus consequence of error (how bad is an undetected AI error?). Sub-tasks in the high-confidence, low-consequence quadrant are AI-automated; sub-tasks in the low-confidence or high-consequence quadrants involve human judgment.

Capability 2 — Interface Design for Human-AI Collaboration. The collaboration interface is not a blank canvas with AI suggestions floating in a sidebar — that design has been repeatedly shown to produce the worst outcomes (anchoring bias plus cognitive overload). The correct design is a structured presentation where AI takes the first pass and produces structured output in a defined schema; the human reviews the structured output, edits specific fields, and approves. The interface explicitly marks which sub-tasks were AI-completed (with confidence indicator) and which require human action. AI-completed fields are pre-filled and editable; human-required fields are empty and mandatory. This design communicates clearly what has been done, what needs review, and what the human must decide — without requiring the human to re-read the full source input from scratch.

Capability 3 — Handoff Protocol. When a task component transitions from AI to human (or from one AI component to another), the handoff must be specified: what context transfers (previous sub-task outputs, source documents, AI confidence for each field, retrieved evidence, constraints that apply to this sub-task); what format (structured JSON schema shared by AI output and human input form); what latency is expected (synchronous for human-waiting workflows; async for batch); and what happens if the handoff is incomplete or invalid (validation at the receiving side; rejection with retry request). The handoff message schema is the contract between AI and human components of the workflow. It must be versioned: changes to the schema require migration of in-flight tasks.

Capability 4 — Cognitive Load Management. The human reviewer's time is the bottleneck in any hybrid intelligence system. Cognitive load must be actively managed: present only the minimum necessary information on the primary view; allow drill-down for supporting evidence; pre-process and structure AI output to eliminate information the human does not need to review (e.g. do not show a human reviewer the full 100-page document if the AI has already extracted the 5 relevant clauses they need to judge); use progressive disclosure (show the highest-priority item first; additional items accessible on demand). Time-on-review is tracked per sub-task type; unexpectedly short review times (potential rubber-stamping) and unexpectedly long review times (interface complexity issue) are both flagged for investigation.

Capability 5 — Trust Calibration. Human trust in AI varies by individual and over time. Both extremes are harmful: over-trust (automation bias) produces rubber-stamping; under-trust (automation resistance) eliminates AI efficiency gains. Trust calibration involves: tracking each human's agreement rate with AI on each sub-task type; comparing their agreement rate to the AI's actual accuracy rate on that sub-task type; if agreement rate significantly exceeds accuracy rate, flag automation bias and notify supervisor; if agreement rate significantly falls below accuracy rate, investigate whether AI performance on this sub-task type is genuinely poor (justified under-trust) or whether individual bias is driving over-riding. Trust calibration data feeds the task allocation framework: if a human systematically re-does AI sub-tasks from scratch, the AI is not adding value for that component with that human and the allocation should be reconsidered.

Capability 6 — Performance Measurement. The hybrid intelligence system must be benchmarked against human-only and AI-only baselines to demonstrate value and identify optimisation opportunities. Metrics measured: task completion time (hybrid vs human-only vs AI-only); output quality (human expert evaluation of random samples from each condition); error rate on downstream outcomes (regulatory findings, claims outcomes, customer complaints); human cognitive load (reported workload rating, time on task); and cost per completed task. The measurement must be run on sufficiently large samples to achieve statistical power. Results are reviewed quarterly; significant deviations from expected performance trigger an allocation review.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Decomposition["Task Decomposition"] A[Complex Task Input] B[Task Decomposer] C{Sub-task Classifier} end subgraph Processing["Processing Layer"] D[AI Processing Engine] E[Human Action Queue] F[Handoff Package Builder] end subgraph Assembly["Assembly and Learning"] G[Collaboration Interface] H[Output Assembler] I[Trust Calibration Monitor] end A --> B B --> C C -->|AI-suitable| D C -->|human-suitable| E D --> F E --> G F --> G G --> H H --> I I -->|bias alert| G style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f3e8ff,stroke:#a855f7 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444

6. Components

Component Type Responsibility Technology Options Criticality
Task Decomposer Application Service Parse task input; identify sub-tasks; route to AI or human queue Rules-based router; LLM-based decomposer; domain-specific parser Critical
AI Processing Engine ML Serving Execute AI-suitable sub-tasks; return structured output with confidence LLM (Claude, GPT-4), fine-tuned classifier, extraction model Critical
Handoff Package Builder Application Service Assemble context package for human review; validate against schema Python microservice; JSON Schema validation High
Collaboration Interface Web Application Present structured AI output; highlight human-required tasks; capture edits Custom React app; task-specific form design Critical
Human Action Queue Durable Queue Hold human-required sub-tasks; manage assignment and SLA PostgreSQL queue; Temporal workflow High
Output Assembler Application Service Merge AI and human sub-task outputs into final task output Python microservice High
Trust Calibration Monitor Analytics Service Track individual agreement rates vs AI accuracy; detect bias Python analytics job; BI dashboard High
Performance Measurement Service Analytics Service Compare hybrid performance to baselines; compute quality, speed, cost metrics Python analytics; Jupyter notebooks; BI tool Medium
Allocation Optimiser Analytics + Advisory Recommend sub-task reallocation based on performance data Python analysis; human decision required for changes Medium

7. Data Flow

Primary Flow

Step Actor Action Output
1 Source System Submits complex task task_id, task_type, input_payload
2 Task Decomposer Identifies sub-tasks; classifies each sub_tasks[]: {sub_task_id, type, classification, input_slice}
3 AI Processing Engine Processes AI-suitable sub-tasks in parallel ai_outputs[]: {sub_task_id, result, confidence, evidence[], processing_time_ms}
4 Handoff Package Builder Assembles AI outputs + human-required sub-tasks + context into collaboration package handoff_package: {task_id, completed_sub_tasks[], pending_human_sub_tasks[], context_docs[], constraints[]}
5 Schema Validator Validates handoff package completeness valid: true/false; validation_errors[]
6 Collaboration Interface Presents structured package to reviewer UI rendered; review_started_at timestamp
7 Human Reviewer Edits AI sub-task outputs; completes human-required sub-tasks reviewed_sub_tasks[]: {sub_task_id, final_value, was_ai_modified, modification_reason, time_spent_ms}
8 Output Assembler Merges AI and human contributions; produces final task output final_output: {task_id, sub_task_outputs[], human_contribution_map{}, ai_contribution_map{}}
9 Trust Calibration Monitor Updates agreement rate for reviewer across sub-task types trust_metrics per reviewer per sub-task_type
10 Performance Measurement Records quality, time, and cost metrics performance_record linked to task_id

Error Flow

Error Condition Detected By Recovery Action Notification
AI sub-task fails (error or timeout) AI Processing Engine Mark sub-task as human-required; add to human action queue Collaboration interface shows "AI unavailable for this item"
Handoff package schema validation failure Handoff Package Builder Retry AI sub-task with explicit structured output instruction; escalate to human if second attempt fails ML Ops alert; human takes over affected sub-task
Human reviewer times out on task SLA Manager Re-assign or escalate to supervisor Operations manager notification
Trust calibration detects automation bias Trust Calibration Monitor Supervisor notification; optional mandatory re-training of reviewer on task guidelines Supervisor; HR if persistent

8. Security Considerations

Authentication and Authorisation

  • Collaboration interface requires SSO + MFA
  • Sub-task authority levels enforced: certain high-consequence sub-tasks (final approval, regulatory referral) may only be completed by senior-level reviewers
  • AI processing service accounts have read access to input data and write access only to AI output fields; cannot write human-required fields

Secrets Management

  • AI model API keys stored in secrets manager; rotated quarterly
  • Source system integration credentials stored in secrets manager

Data Classification

  • Full task input may contain sensitive data (PII, financial, health); collaboration interface presents only the minimum data slice required for each sub-task
  • AI processing engine should not receive sub-task inputs containing data irrelevant to that sub-task (data minimisation at decomposition layer)

Encryption

  • All task data encrypted at rest and in transit
  • Handoff packages containing sensitive data encrypted with envelope encryption; human reviewer decrypts with their session key

Auditability

  • Every sub-task assignment, AI output, human action, and final output logged with full provenance
  • Human-AI contribution map is part of the permanent task record

OWASP LLM Top 10 Considerations

OWASP LLM Risk Applicability Mitigation
LLM01: Prompt Injection High — task input data is passed to LLM for sub-task processing Sanitise task input before inclusion in LLM prompts; use structured output schemas to limit injection surface
LLM02: Insecure Output Handling High — AI outputs are pre-filled into human review forms Validate and sanitise AI output against sub-task schema before rendering in interface
LLM03: Training Data Poisoning Medium — human edits may feed training Validate training data provenance; authority-level filter on edits used as training
LLM04: Model Denial of Service Low Rate limiting on AI processing engine
LLM05: Supply Chain Vulnerabilities Medium — third-party LLM providers Approved provider list; output validation
LLM06: Sensitive Information Disclosure High — LLM may leak training data in sub-task outputs Structured output schemas limiting output surface; PII detection on AI outputs before display
LLM07: Insecure Plugin Design Medium — if AI sub-tasks use tool calls Apply tool call security controls; minimum-permission tool access
LLM08: Excessive Agency High — AI takes first pass on multiple sub-tasks Human final review of all AI outputs is mandatory; no AI sub-task auto-applies to the final output without human confirmation
LLM09: Overreliance Critical — hybrid design can seduce human into rubber-stamping AI outputs Trust calibration monitoring; review time tracking; automation bias alerts
LLM10: Model Theft Medium Access controls on AI output logs which reveal model capabilities

9. Governance Considerations

Responsible AI

  • Task decomposition must be reviewed for bias: is the AI systematically assigned sub-tasks where errors would disproportionately affect protected groups without adequate human oversight?
  • Performance measurement must include fairness analysis: does hybrid performance degrade for cases involving protected group attributes compared to the overall baseline?

Model Risk Management

  • AI sub-task components are models subject to model risk management; each must be registered, validated, and subject to ongoing monitoring
  • Changes to sub-task allocation (moving a sub-task from human to AI) are model risk events requiring sign-off

Human Approval Gates

  • Allocation changes (reclassifying a sub-task from human-suitable to AI-suitable) require performance evidence and Model Risk approval
  • Quarterly performance review must confirm hybrid performance exceeds human-only baseline; if not, allocation is reviewed

Policy Compliance

  • Accountability map must identify which human is accountable for which sub-tasks in the final output
  • For regulated tasks, the accountable human must have the qualifications required by regulation to perform that sub-task

Traceability

  • Final task output must be traceable to: AI sub-task outputs (with model versions and confidence); human sub-task inputs (with reviewer identity); and any modifications made during human review

Governance Artefacts

Artefact Owner Frequency Purpose
Sub-task Allocation Decision Record ML Ops + Domain Lead Per allocation change Document decision to assign sub-task to AI vs human with supporting performance evidence
Hybrid Performance Report ML Ops Quarterly Compare hybrid vs baselines on quality, speed, cost; include fairness analysis
Trust Calibration Report Operations Manager Monthly Individual and aggregate agreement rates; automation bias flags and resolutions
Human-AI Contribution Map Compliance Per task type, annual review Document which sub-tasks are AI-completed vs human-completed for regulatory reporting

10. Operational Considerations

Monitoring

Metric SLO Alert Threshold Owner
Task end-to-end completion time Baseline × 1.2 (hybrid should be faster than human-only) > Baseline × 1.5 Operations
AI sub-task accuracy (sampled audit) > Sub-task accuracy SLA > 5% relative drop ML Ops
Human review completion time per sub-task < defined SLA per sub-task type > 150% SLA Operations Manager
Trust calibration (individual agreement rate) Within ±15% of AI accuracy rate Outside ±25% Supervisor
Handoff package validation success rate > 99% < 98% ML Ops
Output quality score (expert sample rating) > defined quality bar > 5% drop from baseline Quality lead

Logging

  • Full task processing log with AI and human contribution details
  • Time-on-review per sub-task per reviewer
  • All trust calibration metrics stored with rolling history

Incident Response

  • AI processing failure on critical sub-task: immediately assign to human; log AI failure for model review
  • Trust calibration automation bias alert: supervisor investigation within 48 hours
  • Performance report shows hybrid below human-only baseline: emergency allocation review within 2 weeks

Disaster Recovery

Component RTO RPO Strategy
AI Processing Engine 15 min 0 (stateless) Multi-AZ; all sub-tasks fall back to human queue
Collaboration Interface 30 min N/A Multi-AZ
Task and Sub-task Store 30 min 5 min PostgreSQL synchronous standby

Capacity Planning

  • Human reviewer capacity must handle peak volume on all human-required sub-tasks plus AI failure rate spillover
  • AI processing must scale to handle task volume within the handoff latency SLO

11. Cost Considerations

Cost Drivers

Driver Description Relative Weight
Human Reviewer Labour For human-required sub-tasks; reduced compared to human-only baseline by AI handling high-volume sub-tasks High (but lower than human-only)
AI Processing Per sub-task token cost × volume; LLM-based sub-tasks most expensive Medium
Interface Development Custom collaboration interface development; most significant one-time cost High (one-time)
Trust Calibration and Performance Measurement Analytics infrastructure; low ongoing cost Low

Scaling Risks

  • If AI accuracy on any sub-task type falls below its SLA, that sub-task reverts to human handling, increasing labour cost
  • LLM token costs scale with task complexity; complex sub-task prompts at high volume can become significant

Optimisations

  • Fine-tune AI on domain-specific data to improve accuracy and reduce token usage per sub-task
  • Cache AI outputs for identical or near-identical sub-task inputs (reduces cost at high volume)
  • Progressive automation: start with AI as a pre-filler; as confidence in AI accuracy grows, graduate sub-tasks to AI-automated

Indicative Cost Range

Baseline Human-Only Monthly Cost AI-Only Monthly Cost Hybrid Monthly Cost Hybrid Saving vs Human-Only
Small (1K tasks/month) $50,000 $5,000 $30,000 40%
Medium (10K tasks/month) $400,000 $40,000 $200,000 50%
Large (100K tasks/month) $3M $300,000 $1.2M 60%

12. Trade-Off Analysis

Decomposition Granularity Options

Granularity Human Oversight Precision Handoff Complexity Cognitive Load Recommended
Coarse (2–3 large sub-tasks) Low — large AI blocks with limited human check points Low Low Use for mature, well-calibrated domains where AI reliability is high
Medium (5–10 sub-tasks) High — humans review at multiple precise checkpoints Medium Medium Default recommendation; balances oversight and complexity
Fine (>10 sub-tasks) Very High — humans check AI at every step High High — cognitive overload risk Use only for highest-stakes regulated tasks; requires strong interface design to manage load

Architectural Tensions

Tension Option A Option B Resolution Guidance
AI-first vs human-first for ambiguous sub-tasks AI takes first pass; human edits Human takes first pass; AI validates For efficiency: AI-first is 30–50% faster. For quality on novel tasks: human-first avoids anchoring. Default to AI-first; switch to human-first when anchoring is measured as a problem
Collaboration interface richness vs simplicity Full evidence display for every sub-task Minimal display with drill-down Always default to minimal display; provide drill-down for every field. Decision-makers should never need to read 40 pages to approve a sub-task
Strict allocation vs adaptive allocation Fixed allocation: every task follows the same human/AI split Adaptive: confidence-based routing adjusts allocation per-instance Adaptive delivers better efficiency but higher complexity. Start with fixed allocation; add confidence-based routing for mature deployments

13. Failure Modes

Failure Likelihood Impact Detection Recovery
AI anchoring bias in collaboration interface High High — human does not exercise independent judgment Time-on-review monitoring; override rate monitoring Interface redesign; present AI output after human initial assessment
Sub-task classification error (human-required task allocated to AI) Medium Critical for high-stakes tasks Quality audit of AI-completed sub-tasks; outcome monitoring Immediate reallocation; retroactive review of affected task outputs
Handoff package incomplete (missing context) Medium Medium — human makes sub-optimal decision without full context Schema validation failure rate; human review time anomalously high Improve handoff package builder; add missing context sources
Performance measurement baseline contamination Low High — incorrect performance comparison; wrong allocation decisions Performance measurement methodology review Re-run baseline measurement with clean experimental design
Trust calibration data lag (calibration update is slow) Medium Medium — automation bias persists undetected for weeks Trust calibration update frequency monitoring Increase calibration update frequency; near-real-time for high-volume deployments

Cascading Failure Scenario

  • AI accuracy on a key sub-task degrades silently → human agreement rate on that sub-task remains high (automation bias) → performance audit reveals outcome quality decline → source traced to AI sub-task degradation not detected because humans were not providing independent oversight
  • Mitigation: AI sub-task accuracy monitored independently of human agreement rate (sampled expert audit of AI outputs); trust calibration uses AI accuracy data not just human agreement rate

14. Regulatory Considerations

Regulation Specific Clause Requirement Implementation
EU AI Act Article 14 — Human oversight High-risk AI systems require meaningful human oversight at key decision points Hybrid design explicitly maps human oversight to each high-stakes sub-task; human-AI contribution map documents this
EU AI Act Article 13 — Transparency AI system must enable humans to understand outputs AI confidence and evidence per sub-task are presented in collaboration interface
EU AI Act Article 9 — Risk management AI system risk includes sub-task misallocation Sub-task classification review; authority level controls for high-stakes sub-tasks
APRA CPS 230 §50 — Material operational risk Complex AI-assisted workflows are operational risk Performance measurement demonstrates hybrid reliability vs baseline
Privacy Act 1988 (Australia) APP 3 — Minimisation Task decomposition allows data minimisation: each sub-task receives only the data slice it needs Sub-task-level data minimisation is a key design principle of this pattern
ISO 42001:2023 §8.4 — AI system operation Operational controls must maintain performance Performance measurement and allocation optimisation are the operational controls
NIST AI RMF MAP 3.5 — Task suitability AI is deployed only for tasks where it is suitable Task decomposition framework is the formal task suitability assessment
GDPR Article 22 Automated individual decision-making Solely automated decisions with significant effects require human involvement Hybrid design ensures human involvement at all high-consequence decision sub-tasks

15. Reference Implementations

AWS

  • AI Processing: Amazon Bedrock (Claude 3.5 Sonnet for reasoning sub-tasks; Nova Lite for extraction)
  • Task Orchestration: AWS Step Functions for sub-task routing and handoff state machine
  • Human Action Queue: Amazon SQS FIFO + Amazon Connect Tasks for assignment
  • Collaboration Interface: Custom React on Amplify; Amazon Lex for simple conversational sub-tasks
  • Performance Measurement: Amazon QuickSight dashboard reading from Redshift analytics store

Azure

  • AI Processing: Azure OpenAI (GPT-4o for reasoning; GPT-4o-mini for extraction)
  • Task Orchestration: Azure Durable Functions for sub-task state machine
  • Human Action Queue: Azure Service Bus + Microsoft Teams Adaptive Cards for lightweight review
  • Collaboration Interface: Power Apps or custom React on Static Web Apps
  • Performance Measurement: Azure Synapse Analytics + Power BI

GCP

  • AI Processing: Vertex AI Gemini (Pro for reasoning; Flash for extraction)
  • Task Orchestration: Workflows or Cloud Composer (Airflow) for sub-task routing
  • Human Action Queue: Cloud Tasks + Cloud Run for human action API
  • Collaboration Interface: Custom React on Firebase Hosting
  • Performance Measurement: BigQuery + Looker Studio

On-Premises / Private Cloud

  • AI Processing: vLLM serving Llama 3 or Mistral on Kubernetes; fine-tuned sub-task models
  • Task Orchestration: Temporal for durable sub-task state machine
  • Human Action Queue: PostgreSQL-backed queue with priority ordering
  • Collaboration Interface: Custom React on Kubernetes
  • Performance Measurement: Airflow + dbt + Grafana

Pattern ID Relationship Notes
Collaborative AI Decision EAAPL-HIL004 Specialisation — collaborative decision is a two-sub-task hybrid: AI recommendation + human judgment Hybrid intelligence is the generalisation of collaborative decision to N sub-tasks
Human Escalation Pattern EAAPL-HIL003 Complementary — escalation handles cases where AI sub-task confidence is below threshold Confidence-based sub-task escalation is compatible with hybrid architecture
AI Confidence Threshold Routing EAAPL-HIL005 Dependency — sub-task allocation can be confidence-adaptive Threshold routing applies at sub-task level in adaptive hybrid deployments
Annotation and Feedback Loop EAAPL-HIL007 Complementary — human sub-task completions are annotation data for AI sub-task models Human inputs on hybrid tasks feed annotation store for sub-task model improvement
Supervisor Agent EAAPL-MAG002 Complementary — supervisor agent can orchestrate hybrid intelligence workflow Agent supervisor can be the orchestration layer for AI and human sub-tasks
Human Override Pattern EAAPL-HIL006 Dependency — human reviewers must be able to override any AI sub-task output Override is embedded in the collaboration interface for every AI-completed sub-task field

17. Maturity Assessment

Overall Maturity Level: Proven

Dimension Score (1–5) Rationale
Technical Maturity 4 Task decomposition and handoff protocols are well-understood; trust calibration tooling is less mature
Operational Maturity 3 Managing human-AI task allocation dynamically requires significant operational discipline; most organisations have not formalised this
Governance Maturity 4 EU AI Act Article 14 and accountability requirements drive adoption; sub-task accountability mapping satisfies governance needs
Tooling Ecosystem 3 No purpose-built hybrid intelligence platforms; implemented from components (workflow engines, LLM APIs, collaboration tools)
Enterprise Adoption 3 Widely adopted in concept; formally implemented with performance measurement and trust calibration is less common
Risk Profile Medium-High Highest risk is automation bias within the hybrid design; mitigated by trust calibration and performance measurement

18. Revision History

Version Date Author Changes
1.0 2026-06-12 EAAPL Working Group Initial publication covering task decomposition framework, collaboration interface design, handoff protocol, cognitive load management, trust calibration, and performance measurement
← Back to LibraryMore Human-in-the-Loop