Proven

Hybrid Intelligence Pattern

Pattern ID: EAAPL-HIL008 Status: Proven Tags: human-oversight explainability accountability high-complexity Version: 1.0 Last Updated: 2026-06-12

1. Executive Summary

The Hybrid Intelligence Pattern defines an architecture for systematically allocating each component of a complex task to the agent — human or AI — that can perform it best. Rather than treating AI as an optional add-on to human workflows or humans as a fallback for AI failures, it decomposes the task into sub-components, classifies each by suitability criteria, and designs explicit handoff protocols that transfer context between agents with minimal friction and maximal fidelity.

This pattern addresses the most common failure mode in enterprise AI deployments: asking a single agent (usually AI alone or human-plus-AI suggestions in a sidebar) to perform all components of a task, ignoring that AI excels at pattern-matched high-volume processing while humans excel at novel, ethical, and ambiguous reasoning. The pattern covers task decomposition methodology; interface design for human-AI collaboration; handoff protocols; cognitive load management; trust calibration to prevent both automation bias and underuse; and performance measurement that compares hybrid intelligence against human-only and AI-only baselines. CIOs and CTOs gain a framework for continuous optimisation of human-AI task allocation, delivering measurable quality and efficiency gains that neither AI nor humans can achieve independently.

2. Problem Statement

Business Problem

Enterprises deploying AI into complex knowledge work face a paradox: AI is too often given the entire task (producing poor quality on the hard parts) or too little of the task (producing limited efficiency gains). The optimal allocation — AI takes what it does well, humans take the rest — is rarely designed deliberately. It emerges ad hoc, inconsistently across teams, and without any mechanism for measurement or improvement.

Technical Problem

Task decomposition requires identifying sub-task types and classifying them by AI suitability criteria. Without this classification, engineers default to giving AI the whole input and presenting the whole output to humans for review — which does not actually leverage AI's strengths or protect against its weaknesses. The handoff between AI and human sub-tasks is an engineering problem that is frequently underspecified: what context transfers, in what format, at what latency, and with what guarantees of completeness?

Symptoms

AI is used for whole-task processing but humans override a high percentage of AI outputs
Human reviewers cannot articulate which parts of the AI output they are checking vs rubber-stamping
Task completion time with AI assistance is not significantly faster than without it, despite AI being deployed
No measurement exists of whether hybrid performance exceeds human-only or AI-only performance
Human trust in AI varies wildly across team members: some always accept AI outputs; others always override

Cost of Inaction

Sub-optimal allocation leaves efficiency gains unrealised and quality gains uncaptured
Automation bias (over-trusting AI) and underuse bias (over-riding AI) coexist in the same team, producing inconsistent quality
Without performance measurement, it is impossible to demonstrate AI ROI or to improve the allocation over time

3. Context

When to Apply

Complex knowledge work tasks with multiple identifiable sub-components
Domains where some sub-tasks are AI-suitable (high volume, pattern-matched) and others are human-suitable (novel, ethical, ambiguous)
Regulated environments where accountability for specific decision components must be attributable to a named human
Teams mature enough to run performance measurement and iterate on human-AI allocation

When NOT to Apply

Simple uniform tasks where decomposition does not reveal meaningfully different sub-components
Latency-critical tasks where the handoff protocol overhead is architecturally prohibitive
Organisations that lack the operational maturity to manage the handoff protocol and performance measurement

Prerequisites

Complex task with identifiable, separable sub-components
AI system capable of processing at least some sub-components reliably
Human expert workforce available for high-suitability-human sub-components
Performance measurement infrastructure (quality metrics, timing, outcome tracking)

Industry Applicability

Industry	Complex Task	AI-Suitable Sub-Tasks	Human-Suitable Sub-Tasks
Legal	Contract review	Clause identification, standard risk flagging, precedent matching	Novel risk assessment, negotiation recommendation, client advice
Healthcare	Clinical documentation	ICD coding of standard diagnoses, medication reconciliation	Complex comorbidity assessment, patient communication, ethical decisions
Financial Services	Credit assessment	Document data extraction, fraud signal scoring, policy compliance check	Final credit judgment, exception handling, relationship context
Insurance	Claims processing	Document classification, damage estimation from photos, fraud scoring	Coverage interpretation disputes, large loss adjustment, litigation management
HR	Candidate assessment	Resume parsing, skills matching, compliance screening	Interview quality assessment, culture fit judgment, hiring decision
Research	Literature review	Paper retrieval, citation extraction, structured data extraction	Synthesis, hypothesis generation, expert interpretation

4. Architecture Overview

The Hybrid Intelligence Pattern requires implementing six capabilities in integrated sequence.

Capability 1 — Task Decomposition Framework. The first step is to decompose the target task into sub-components and classify each by suitability using four criteria: volume and repetition (AI-suitable if the sub-task type recurs at high volume with consistent patterns); variance and novelty (human-suitable if novel cases are common and require contextual judgment); ethical and accountability requirements (always human if the sub-task produces a judgment requiring personal accountability — e.g. "should we approve this patient's request for surgery?"); and error tolerance (human-required if errors have high consequence and are hard to reverse). Each sub-task is classified on a 2×2 matrix: AI confidence (can the AI perform this reliably?) versus consequence of error (how bad is an undetected AI error?). Sub-tasks in the high-confidence, low-consequence quadrant are AI-automated; sub-tasks in the low-confidence or high-consequence quadrants involve human judgment.

Capability 2 — Interface Design for Human-AI Collaboration. The collaboration interface is not a blank canvas with AI suggestions floating in a sidebar — that design has been repeatedly shown to produce the worst outcomes (anchoring bias plus cognitive overload). The correct design is a structured presentation where AI takes the first pass and produces structured output in a defined schema; the human reviews the structured output, edits specific fields, and approves. The interface explicitly marks which sub-tasks were AI-completed (with confidence indicator) and which require human action. AI-completed fields are pre-filled and editable; human-required fields are empty and mandatory. This design communicates clearly what has been done, what needs review, and what the human must decide — without requiring the human to re-read the full source input from scratch.

Capability 3 — Handoff Protocol. When a task component transitions from AI to human (or from one AI component to another), the handoff must be specified: what context transfers (previous sub-task outputs, source documents, AI confidence for each field, retrieved evidence, constraints that apply to this sub-task); what format (structured JSON schema shared by AI output and human input form); what latency is expected (synchronous for human-waiting workflows; async for batch); and what happens if the handoff is incomplete or invalid (validation at the receiving side; rejection with retry request). The handoff message schema is the contract between AI and human components of the workflow. It must be versioned: changes to the schema require migration of in-flight tasks.

Capability 4 — Cognitive Load Management. The human reviewer's time is the bottleneck in any hybrid intelligence system. Cognitive load must be actively managed: present only the minimum necessary information on the primary view; allow drill-down for supporting evidence; pre-process and structure AI output to eliminate information the human does not need to review (e.g. do not show a human reviewer the full 100-page document if the AI has already extracted the 5 relevant clauses they need to judge); use progressive disclosure (show the highest-priority item first; additional items accessible on demand). Time-on-review is tracked per sub-task type; unexpectedly short review times (potential rubber-stamping) and unexpectedly long review times (interface complexity issue) are both flagged for investigation.

Capability 5 — Trust Calibration. Human trust in AI varies by individual and over time. Both extremes are harmful: over-trust (automation bias) produces rubber-stamping; under-trust (automation resistance) eliminates AI efficiency gains. Trust calibration involves: tracking each human's agreement rate with AI on each sub-task type; comparing their agreement rate to the AI's actual accuracy rate on that sub-task type; if agreement rate significantly exceeds accuracy rate, flag automation bias and notify supervisor; if agreement rate significantly falls below accuracy rate, investigate whether AI performance on this sub-task type is genuinely poor (justified under-trust) or whether individual bias is driving over-riding. Trust calibration data feeds the task allocation framework: if a human systematically re-does AI sub-tasks from scratch, the AI is not adding value for that component with that human and the allocation should be reconsidered.

Capability 6 — Performance Measurement. The hybrid intelligence system must be benchmarked against human-only and AI-only baselines to demonstrate value and identify optimisation opportunities. Metrics measured: task completion time (hybrid vs human-only vs AI-only); output quality (human expert evaluation of random samples from each condition); error rate on downstream outcomes (regulatory findings, claims outcomes, customer complaints); human cognitive load (reported workload rating, time on task); and cost per completed task. The measurement must be run on sufficiently large samples to achieve statistical power. Results are reviewed quarterly; significant deviations from expected performance trigger an allocation review.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Decomposition["Task Decomposition"] A[Complex Task Input] B[Task Decomposer] C{Sub-task Classifier} end subgraph Processing["Processing Layer"] D[AI Processing Engine] E[Human Action Queue] F[Handoff Package Builder] end subgraph Assembly["Assembly and Learning"] G[Collaboration Interface] H[Output Assembler] I[Trust Calibration Monitor] end A --> B B --> C C -->|AI-suitable| D C -->|human-suitable| E D --> F E --> G F --> G G --> H H --> I I -->|bias alert| G style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f3e8ff,stroke:#a855f7 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f0fdf4,stroke:#22c55e style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#d1fae5,stroke:#10b981 style I fill:#fee2e2,stroke:#ef4444

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Task Decomposer	Application Service	Parse task input; identify sub-tasks; route to AI or human queue	Rules-based router; LLM-based decomposer; domain-specific parser	Critical
AI Processing Engine	ML Serving	Execute AI-suitable sub-tasks; return structured output with confidence	LLM (Claude, GPT-4), fine-tuned classifier, extraction model	Critical
Handoff Package Builder	Application Service	Assemble context package for human review; validate against schema	Python microservice; JSON Schema validation	High
Collaboration Interface	Web Application	Present structured AI output; highlight human-required tasks; capture edits	Custom React app; task-specific form design	Critical
Human Action Queue	Durable Queue	Hold human-required sub-tasks; manage assignment and SLA	PostgreSQL queue; Temporal workflow	High
Output Assembler	Application Service	Merge AI and human sub-task outputs into final task output	Python microservice	High
Trust Calibration Monitor	Analytics Service	Track individual agreement rates vs AI accuracy; detect bias	Python analytics job; BI dashboard	High
Performance Measurement Service	Analytics Service	Compare hybrid performance to baselines; compute quality, speed, cost metrics	Python analytics; Jupyter notebooks; BI tool	Medium
Allocation Optimiser	Analytics + Advisory	Recommend sub-task reallocation based on performance data	Python analysis; human decision required for changes	Medium

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	Source System	Submits complex task	task_id, task_type, input_payload
2	Task Decomposer	Identifies sub-tasks; classifies each	sub_tasks[]: {sub_task_id, type, classification, input_slice}
3	AI Processing Engine	Processes AI-suitable sub-tasks in parallel	ai_outputs[]: {sub_task_id, result, confidence, evidence[], processing_time_ms}
4	Handoff Package Builder	Assembles AI outputs + human-required sub-tasks + context into collaboration package	handoff_package: {task_id, completed_sub_tasks[], pending_human_sub_tasks[], context_docs[], constraints[]}
5	Schema Validator	Validates handoff package completeness	valid: true/false; validation_errors[]
6	Collaboration Interface	Presents structured package to reviewer	UI rendered; review_started_at timestamp
7	Human Reviewer	Edits AI sub-task outputs; completes human-required sub-tasks	reviewed_sub_tasks[]: {sub_task_id, final_value, was_ai_modified, modification_reason, time_spent_ms}
8	Output Assembler	Merges AI and human contributions; produces final task output	final_output: {task_id, sub_task_outputs[], human_contribution_map{}, ai_contribution_map{}}
9	Trust Calibration Monitor	Updates agreement rate for reviewer across sub-task types	trust_metrics per reviewer per sub-task_type
10	Performance Measurement	Records quality, time, and cost metrics	performance_record linked to task_id

Error Flow

Error Condition	Detected By	Recovery Action	Notification
AI sub-task fails (error or timeout)	AI Processing Engine	Mark sub-task as human-required; add to human action queue	Collaboration interface shows "AI unavailable for this item"
Handoff package schema validation failure	Handoff Package Builder	Retry AI sub-task with explicit structured output instruction; escalate to human if second attempt fails	ML Ops alert; human takes over affected sub-task
Human reviewer times out on task	SLA Manager	Re-assign or escalate to supervisor	Operations manager notification
Trust calibration detects automation bias	Trust Calibration Monitor	Supervisor notification; optional mandatory re-training of reviewer on task guidelines	Supervisor; HR if persistent

8. Security Considerations

Authentication and Authorisation

Collaboration interface requires SSO + MFA
Sub-task authority levels enforced: certain high-consequence sub-tasks (final approval, regulatory referral) may only be completed by senior-level reviewers
AI processing service accounts have read access to input data and write access only to AI output fields; cannot write human-required fields

Secrets Management

AI model API keys stored in secrets manager; rotated quarterly
Source system integration credentials stored in secrets manager

Data Classification

Full task input may contain sensitive data (PII, financial, health); collaboration interface presents only the minimum data slice required for each sub-task
AI processing engine should not receive sub-task inputs containing data irrelevant to that sub-task (data minimisation at decomposition layer)

Encryption

All task data encrypted at rest and in transit
Handoff packages containing sensitive data encrypted with envelope encryption; human reviewer decrypts with their session key

Auditability

Every sub-task assignment, AI output, human action, and final output logged with full provenance
Human-AI contribution map is part of the permanent task record

OWASP LLM Top 10 Considerations

OWASP LLM Risk	Applicability	Mitigation
LLM01: Prompt Injection	High — task input data is passed to LLM for sub-task processing	Sanitise task input before inclusion in LLM prompts; use structured output schemas to limit injection surface
LLM02: Insecure Output Handling	High — AI outputs are pre-filled into human review forms	Validate and sanitise AI output against sub-task schema before rendering in interface
LLM03: Training Data Poisoning	Medium — human edits may feed training	Validate training data provenance; authority-level filter on edits used as training
LLM04: Model Denial of Service	Low	Rate limiting on AI processing engine
LLM05: Supply Chain Vulnerabilities	Medium — third-party LLM providers	Approved provider list; output validation
LLM06: Sensitive Information Disclosure	High — LLM may leak training data in sub-task outputs	Structured output schemas limiting output surface; PII detection on AI outputs before display
LLM07: Insecure Plugin Design	Medium — if AI sub-tasks use tool calls	Apply tool call security controls; minimum-permission tool access
LLM08: Excessive Agency	High — AI takes first pass on multiple sub-tasks	Human final review of all AI outputs is mandatory; no AI sub-task auto-applies to the final output without human confirmation
LLM09: Overreliance	Critical — hybrid design can seduce human into rubber-stamping AI outputs	Trust calibration monitoring; review time tracking; automation bias alerts
LLM10: Model Theft	Medium	Access controls on AI output logs which reveal model capabilities

9. Governance Considerations

Responsible AI

Task decomposition must be reviewed for bias: is the AI systematically assigned sub-tasks where errors would disproportionately affect protected groups without adequate human oversight?
Performance measurement must include fairness analysis: does hybrid performance degrade for cases involving protected group attributes compared to the overall baseline?

Model Risk Management

AI sub-task components are models subject to model risk management; each must be registered, validated, and subject to ongoing monitoring
Changes to sub-task allocation (moving a sub-task from human to AI) are model risk events requiring sign-off

Human Approval Gates

Allocation changes (reclassifying a sub-task from human-suitable to AI-suitable) require performance evidence and Model Risk approval
Quarterly performance review must confirm hybrid performance exceeds human-only baseline; if not, allocation is reviewed

Policy Compliance

Accountability map must identify which human is accountable for which sub-tasks in the final output
For regulated tasks, the accountable human must have the qualifications required by regulation to perform that sub-task

Traceability

Final task output must be traceable to: AI sub-task outputs (with model versions and confidence); human sub-task inputs (with reviewer identity); and any modifications made during human review

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Sub-task Allocation Decision Record	ML Ops + Domain Lead	Per allocation change	Document decision to assign sub-task to AI vs human with supporting performance evidence
Hybrid Performance Report	ML Ops	Quarterly	Compare hybrid vs baselines on quality, speed, cost; include fairness analysis
Trust Calibration Report	Operations Manager	Monthly	Individual and aggregate agreement rates; automation bias flags and resolutions
Human-AI Contribution Map	Compliance	Per task type, annual review	Document which sub-tasks are AI-completed vs human-completed for regulatory reporting

10. Operational Considerations

Monitoring

Metric	SLO	Alert Threshold	Owner
Task end-to-end completion time	Baseline × 1.2 (hybrid should be faster than human-only)	> Baseline × 1.5	Operations
AI sub-task accuracy (sampled audit)	> Sub-task accuracy SLA	> 5% relative drop	ML Ops
Human review completion time per sub-task	< defined SLA per sub-task type	> 150% SLA	Operations Manager
Trust calibration (individual agreement rate)	Within ±15% of AI accuracy rate	Outside ±25%	Supervisor
Handoff package validation success rate	> 99%	< 98%	ML Ops
Output quality score (expert sample rating)	> defined quality bar	> 5% drop from baseline	Quality lead

Logging

Full task processing log with AI and human contribution details
Time-on-review per sub-task per reviewer
All trust calibration metrics stored with rolling history

Incident Response

AI processing failure on critical sub-task: immediately assign to human; log AI failure for model review
Trust calibration automation bias alert: supervisor investigation within 48 hours
Performance report shows hybrid below human-only baseline: emergency allocation review within 2 weeks

Disaster Recovery

Component	RTO	RPO	Strategy
AI Processing Engine	15 min	0 (stateless)	Multi-AZ; all sub-tasks fall back to human queue
Collaboration Interface	30 min	N/A	Multi-AZ
Task and Sub-task Store	30 min	5 min	PostgreSQL synchronous standby

Capacity Planning

Human reviewer capacity must handle peak volume on all human-required sub-tasks plus AI failure rate spillover
AI processing must scale to handle task volume within the handoff latency SLO

11. Cost Considerations

Cost Drivers

Driver	Description	Relative Weight
Human Reviewer Labour	For human-required sub-tasks; reduced compared to human-only baseline by AI handling high-volume sub-tasks	High (but lower than human-only)
AI Processing	Per sub-task token cost × volume; LLM-based sub-tasks most expensive	Medium
Interface Development	Custom collaboration interface development; most significant one-time cost	High (one-time)
Trust Calibration and Performance Measurement	Analytics infrastructure; low ongoing cost	Low

Scaling Risks

If AI accuracy on any sub-task type falls below its SLA, that sub-task reverts to human handling, increasing labour cost
LLM token costs scale with task complexity; complex sub-task prompts at high volume can become significant

Optimisations

Fine-tune AI on domain-specific data to improve accuracy and reduce token usage per sub-task
Cache AI outputs for identical or near-identical sub-task inputs (reduces cost at high volume)
Progressive automation: start with AI as a pre-filler; as confidence in AI accuracy grows, graduate sub-tasks to AI-automated

Indicative Cost Range

Baseline	Human-Only Monthly Cost	AI-Only Monthly Cost	Hybrid Monthly Cost	Hybrid Saving vs Human-Only
Small (1K tasks/month)	$50,000	$5,000	$30,000	40%
Medium (10K tasks/month)	$400,000	$40,000	$200,000	50%
Large (100K tasks/month)	$3M	$300,000	$1.2M	60%

12. Trade-Off Analysis

Decomposition Granularity Options

Granularity	Human Oversight Precision	Handoff Complexity	Cognitive Load	Recommended
Coarse (2–3 large sub-tasks)	Low — large AI blocks with limited human check points	Low	Low	Use for mature, well-calibrated domains where AI reliability is high
Medium (5–10 sub-tasks)	High — humans review at multiple precise checkpoints	Medium	Medium	Default recommendation; balances oversight and complexity
Fine (>10 sub-tasks)	Very High — humans check AI at every step	High	High — cognitive overload risk	Use only for highest-stakes regulated tasks; requires strong interface design to manage load

Architectural Tensions

Tension	Option A	Option B	Resolution Guidance
AI-first vs human-first for ambiguous sub-tasks	AI takes first pass; human edits	Human takes first pass; AI validates	For efficiency: AI-first is 30–50% faster. For quality on novel tasks: human-first avoids anchoring. Default to AI-first; switch to human-first when anchoring is measured as a problem
Collaboration interface richness vs simplicity	Full evidence display for every sub-task	Minimal display with drill-down	Always default to minimal display; provide drill-down for every field. Decision-makers should never need to read 40 pages to approve a sub-task
Strict allocation vs adaptive allocation	Fixed allocation: every task follows the same human/AI split	Adaptive: confidence-based routing adjusts allocation per-instance	Adaptive delivers better efficiency but higher complexity. Start with fixed allocation; add confidence-based routing for mature deployments

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
AI anchoring bias in collaboration interface	High	High — human does not exercise independent judgment	Time-on-review monitoring; override rate monitoring	Interface redesign; present AI output after human initial assessment
Sub-task classification error (human-required task allocated to AI)	Medium	Critical for high-stakes tasks	Quality audit of AI-completed sub-tasks; outcome monitoring	Immediate reallocation; retroactive review of affected task outputs
Handoff package incomplete (missing context)	Medium	Medium — human makes sub-optimal decision without full context	Schema validation failure rate; human review time anomalously high	Improve handoff package builder; add missing context sources
Performance measurement baseline contamination	Low	High — incorrect performance comparison; wrong allocation decisions	Performance measurement methodology review	Re-run baseline measurement with clean experimental design
Trust calibration data lag (calibration update is slow)	Medium	Medium — automation bias persists undetected for weeks	Trust calibration update frequency monitoring	Increase calibration update frequency; near-real-time for high-volume deployments

Cascading Failure Scenario

AI accuracy on a key sub-task degrades silently → human agreement rate on that sub-task remains high (automation bias) → performance audit reveals outcome quality decline → source traced to AI sub-task degradation not detected because humans were not providing independent oversight
Mitigation: AI sub-task accuracy monitored independently of human agreement rate (sampled expert audit of AI outputs); trust calibration uses AI accuracy data not just human agreement rate

14. Regulatory Considerations

Regulation	Specific Clause	Requirement	Implementation
EU AI Act	Article 14 — Human oversight	High-risk AI systems require meaningful human oversight at key decision points	Hybrid design explicitly maps human oversight to each high-stakes sub-task; human-AI contribution map documents this
EU AI Act	Article 13 — Transparency	AI system must enable humans to understand outputs	AI confidence and evidence per sub-task are presented in collaboration interface
EU AI Act	Article 9 — Risk management	AI system risk includes sub-task misallocation	Sub-task classification review; authority level controls for high-stakes sub-tasks
APRA CPS 230	§50 — Material operational risk	Complex AI-assisted workflows are operational risk	Performance measurement demonstrates hybrid reliability vs baseline
Privacy Act 1988 (Australia)	APP 3 — Minimisation	Task decomposition allows data minimisation: each sub-task receives only the data slice it needs	Sub-task-level data minimisation is a key design principle of this pattern
ISO 42001:2023	§8.4 — AI system operation	Operational controls must maintain performance	Performance measurement and allocation optimisation are the operational controls
NIST AI RMF	MAP 3.5 — Task suitability	AI is deployed only for tasks where it is suitable	Task decomposition framework is the formal task suitability assessment
GDPR Article 22	Automated individual decision-making	Solely automated decisions with significant effects require human involvement	Hybrid design ensures human involvement at all high-consequence decision sub-tasks

15. Reference Implementations

AWS

AI Processing: Amazon Bedrock (Claude 3.5 Sonnet for reasoning sub-tasks; Nova Lite for extraction)
Task Orchestration: AWS Step Functions for sub-task routing and handoff state machine
Human Action Queue: Amazon SQS FIFO + Amazon Connect Tasks for assignment
Collaboration Interface: Custom React on Amplify; Amazon Lex for simple conversational sub-tasks
Performance Measurement: Amazon QuickSight dashboard reading from Redshift analytics store

Azure

AI Processing: Azure OpenAI (GPT-4o for reasoning; GPT-4o-mini for extraction)
Task Orchestration: Azure Durable Functions for sub-task state machine
Human Action Queue: Azure Service Bus + Microsoft Teams Adaptive Cards for lightweight review
Collaboration Interface: Power Apps or custom React on Static Web Apps
Performance Measurement: Azure Synapse Analytics + Power BI

GCP

AI Processing: Vertex AI Gemini (Pro for reasoning; Flash for extraction)
Task Orchestration: Workflows or Cloud Composer (Airflow) for sub-task routing
Human Action Queue: Cloud Tasks + Cloud Run for human action API
Collaboration Interface: Custom React on Firebase Hosting
Performance Measurement: BigQuery + Looker Studio

On-Premises / Private Cloud

AI Processing: vLLM serving Llama 3 or Mistral on Kubernetes; fine-tuned sub-task models
Task Orchestration: Temporal for durable sub-task state machine
Human Action Queue: PostgreSQL-backed queue with priority ordering
Collaboration Interface: Custom React on Kubernetes
Performance Measurement: Airflow + dbt + Grafana

Pattern	ID	Relationship	Notes
Collaborative AI Decision	EAAPL-HIL004	Specialisation — collaborative decision is a two-sub-task hybrid: AI recommendation + human judgment	Hybrid intelligence is the generalisation of collaborative decision to N sub-tasks
Human Escalation Pattern	EAAPL-HIL003	Complementary — escalation handles cases where AI sub-task confidence is below threshold	Confidence-based sub-task escalation is compatible with hybrid architecture
AI Confidence Threshold Routing	EAAPL-HIL005	Dependency — sub-task allocation can be confidence-adaptive	Threshold routing applies at sub-task level in adaptive hybrid deployments
Annotation and Feedback Loop	EAAPL-HIL007	Complementary — human sub-task completions are annotation data for AI sub-task models	Human inputs on hybrid tasks feed annotation store for sub-task model improvement
Supervisor Agent	EAAPL-MAG002	Complementary — supervisor agent can orchestrate hybrid intelligence workflow	Agent supervisor can be the orchestration layer for AI and human sub-tasks
Human Override Pattern	EAAPL-HIL006	Dependency — human reviewers must be able to override any AI sub-task output	Override is embedded in the collaboration interface for every AI-completed sub-task field

17. Maturity Assessment

Overall Maturity Level: Proven

Dimension	Score (1–5)	Rationale
Technical Maturity	4	Task decomposition and handoff protocols are well-understood; trust calibration tooling is less mature
Operational Maturity	3	Managing human-AI task allocation dynamically requires significant operational discipline; most organisations have not formalised this
Governance Maturity	4	EU AI Act Article 14 and accountability requirements drive adoption; sub-task accountability mapping satisfies governance needs
Tooling Ecosystem	3	No purpose-built hybrid intelligence platforms; implemented from components (workflow engines, LLM APIs, collaboration tools)
Enterprise Adoption	3	Widely adopted in concept; formally implemented with performance measurement and trust calibration is less common
Risk Profile	Medium-High	Highest risk is automation bias within the hybrid design; mitigated by trust calibration and performance measurement

18. Revision History

Version	Date	Author	Changes
1.0	2026-06-12	EAAPL Working Group	Initial publication covering task decomposition framework, collaboration interface design, handoff protocol, cognitive load management, trust calibration, and performance measurement

← Back to Library More Human-in-the-Loop →

Hybrid Intelligence Pattern

Hybrid Intelligence Pattern

1. Executive Summary

2. Problem Statement

Business Problem

Technical Problem

Symptoms

Cost of Inaction

3. Context

When to Apply

When NOT to Apply

Prerequisites

Industry Applicability

4. Architecture Overview

5. Architecture Diagram

6. Components

7. Data Flow

Primary Flow

Error Flow

8. Security Considerations

Authentication and Authorisation

Secrets Management

Data Classification

Encryption

Auditability

OWASP LLM Top 10 Considerations

9. Governance Considerations

Responsible AI

Model Risk Management

Human Approval Gates

Policy Compliance

Traceability

Governance Artefacts

10. Operational Considerations

Monitoring

Logging

Incident Response

Disaster Recovery

Capacity Planning

11. Cost Considerations

Cost Drivers

Scaling Risks

Optimisations

Indicative Cost Range

12. Trade-Off Analysis

Decomposition Granularity Options

Architectural Tensions

13. Failure Modes

Cascading Failure Scenario

14. Regulatory Considerations

15. Reference Implementations

AWS

Azure

GCP

On-Premises / Private Cloud

16. Related Patterns

17. Maturity Assessment

18. Revision History