EAAPL-AGT006Proven

Reflexive Agent

🤖 Agentic AIEU AI ActISO/IEC 42001

[EAAPL-AGT006] Reflexive Agent

Category: Agentic AI Sub-category: Quality Assurance Architecture Version: 1.1 Maturity: Emerging Tags: self-critique, reflection, quality-gate, generate-critique-revise, anti-loop, cost-control, output-quality Regulatory Relevance: EU AI Act (Art. 9, 15), ISO 42001 §8.4, NIST AI RMF (MEASURE 2.5)

1. Executive Summary

The Reflexive Agent Pattern defines an architecture in which an AI agent evaluates the quality of its own outputs through a structured generate-critique-revise cycle before returning results to the calling system. By adding an explicit self-evaluation step to the standard agent loop, organisations achieve measurable improvements in output quality — particularly for high-stakes knowledge work tasks like contract drafting, regulatory analysis, and clinical documentation — without requiring manual human review of every output.

For CIO/CTO audiences: this pattern is the AI equivalent of a professional practice quality review. A lawyer reviews their own memo before sending it; a radiologist performs a double-read on ambiguous scans. The Reflexive Agent embeds that review step into the automated workflow, catching errors and quality gaps before they reach users or downstream systems. The trade-off is cost: reflection requires additional LLM inference calls. This pattern defines the governance around when reflection is worth the cost, how to prevent reflection cycles from running indefinitely, and how to integrate reflection with human oversight. For high-stakes, low-volume tasks, the quality improvement easily justifies the cost. For high-volume, low-stakes tasks, reflection should be applied selectively based on confidence scoring.

2. Problem Statement

Business Problem

AI agents deployed for high-stakes knowledge work (legal drafting, medical documentation, financial analysis) produce outputs that are factually incorrect, structurally incomplete, or inconsistent with organisational standards at rates that are unacceptable for direct use without review. Manual review by human experts is the only existing quality gate, but it is expensive and creates the bottleneck that undermines the productivity value of automation.

Technical Problem

A standard agent loop generates outputs without any internal mechanism to evaluate their quality relative to the task objective. The model produces the most probable next token; it has no objective function that penalises factual errors, logical inconsistencies, or failure to meet specified quality criteria. Adding an external evaluation step after the loop completes catches errors too late — the full generation cost has already been incurred for an output that may require significant revision.

Symptoms of Absence

Agent outputs for high-stakes tasks require expert human review of every output, negating the productivity benefit
Quality is inconsistent and unpredictable — excellent outputs and poor outputs arrive with no distinguishing signal
No feedback loop: the agent does not learn from its quality failures within or across tasks
High escalation rate to human review even when outputs are clearly adequate

Cost of Inaction

Quality Risk: Unreviewed poor-quality outputs from agents performing regulated tasks create compliance and liability exposure
Operational: Expert review bottleneck grows with agent usage volume, offsetting scale benefits
Competitive: Peers who implement reflection achieve demonstrably better output quality and can deploy agents in higher-stakes domains

3. Context

When to Apply

Output quality has material business or compliance consequences (legal, medical, financial, regulatory)
The task type has clear, articulable quality criteria that can be expressed in a critique prompt
Task volume is moderate (the additional LLM cost per task is justified by quality improvement)
The target quality improvement is measurable (a quality benchmark exists or can be created)
Tasks where partial output correction is faster than full regeneration

When NOT to Apply

High-volume, low-stakes tasks where reflection cost exceeds quality improvement value
Tasks with no articulable quality criteria (purely subjective outputs)
Real-time tasks with hard latency constraints incompatible with multi-pass generation
Tasks where the initial output quality is already above the acceptance threshold (waste of compute)

Prerequisites

EAAPL-AGT001 (Single Agent Pattern) baseline
Defined quality rubric for the task type (criteria for the critique prompt)
Quality threshold parameter (minimum acceptable quality score)
Anti-loop detection (max revision iteration limit)
Cost tracking per reflection cycle

Industry Applicability

Industry	Task Type	Quality Criteria	Reflection Value
Legal Services	Contract drafting, clause review	Accuracy, completeness, consistency with precedents	Very High
Healthcare	Clinical summary, discharge letter	Clinical accuracy, completeness, safety	Very High
Financial Services	Analyst reports, regulatory disclosures	Factual accuracy, regulatory compliance, clarity	High
Technology	Code generation, technical documentation	Correctness, security, completeness	High
Consulting	Executive reports, strategy documents	Logical consistency, evidence support, clarity	Medium

4. Architecture Overview

The Reflexive Agent Pattern extends the standard agent loop (EAAPL-AGT001) by inserting a critique-revise sub-loop between the initial output generation and the final result delivery. The sub-loop has its own termination conditions and cost controls independent of the outer loop.

Why separate the critic from the generator? The same model that generates an output has a well-documented tendency to fail to critique its own errors — it is drawn toward confirming its own output rather than challenging it. Two strategies address this. First, the critique is prompted with an explicitly adversarial persona ("You are a strict expert reviewer. Identify all factual errors, logical gaps, and failures to meet the stated criteria"). Second, in higher-investment implementations, a separate model instance (or a different model entirely) performs the critique, reducing the correlation between generator and critic errors.

Generate Phase The initial generation follows the standard agent loop. The generate phase produces a candidate output — a document, analysis, code, or other artifact — and a confidence score (either model-produced or estimated from the output structure and completeness).

Confidence Gating Before entering the reflection sub-loop, a confidence gate evaluates whether reflection is needed. If the initial output's confidence score exceeds the configured "auto-accept threshold," the output is returned without reflection. This is the primary cost optimisation: for the majority of tasks where the initial output is clearly adequate, no additional inference calls are made. The threshold is tuned per task type based on observed quality distributions.

Critique Phase The Critique Engine receives the candidate output and the task objective (original instruction + quality rubric). It executes an LLM inference call with an adversarial reviewer persona. The critique prompt is carefully designed to produce structured output: a list of specific issues (each with a category: factual error / logical gap / missing requirement / style violation / inconsistency) and an overall quality score (0–100). The critique prompt is the most important engineering artefact in this pattern — vague critique prompts produce vague, unhelpful critique that does not guide revision.

Quality Gate The Quality Gate evaluates the critique output. If the quality score meets or exceeds the acceptance threshold and no critical issues are flagged, the output is accepted and returned. If issues are present, the Revision Engine is invoked.

Revision Phase The Revision Engine receives the original output, the original task instruction, and the structured critique. It invokes an LLM to produce a revised output that addresses the specific issues identified in the critique. The revision prompt is targeted: "Revise the following draft to address these specific issues: [critique issues]. Do not change content that was not flagged as an issue." This targeted revision approach is more efficient than full regeneration and preserves the valid portions of the initial output.

Anti-Loop Detection and Cost Control The reflection sub-loop enforces a hard maximum of N critique-revise cycles (default: 3). If the output has not reached the acceptance threshold after N cycles, the best output produced so far (highest quality score across all iterations) is returned with a reflection metadata flag indicating that the quality threshold was not reached. This prevents infinite reflection loops from running up unbounded inference costs. The total cost of all reflection cycles is tracked and reported; a per-task reflection cost ceiling can trigger early termination.

Reflection Memory Critique outputs from completed tasks are written to the agent's episodic memory store (EAAPL-AGT002) with the task type, initial quality score, final quality score, and the specific issues identified. The Memory Consolidation Engine processes these records to update the semantic memory with task-type-specific quality learnings. Over time, the generator's prompting is improved based on accumulated knowledge of the most common quality failures for each task type — reducing the number of reflection cycles needed and improving first-pass quality.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Input["Task Input"] A[Task + Quality Rubric] end subgraph Core["Generate-Critique-Revise Loop"] B[Generate Phase] C{Confidence Gate} D[Critique Engine] E{Quality Gate} F[Revision Engine] end subgraph Output["Output Layer"] G[Accepted Output] H[Best Output Warning] I[(Reflection Memory)] end A --> B B --> C C -->|above threshold| G C -->|below threshold| D D --> E E -->|accepted| G E -->|max cycles hit| H E -->|revise| F F --> D G --> I H --> I style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f3e8ff,stroke:#a855f7 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f0fdf4,stroke:#22c55e style G fill:#d1fae5,stroke:#10b981 style H fill:#fee2e2,stroke:#ef4444 style I fill:#fef9c3,stroke:#eab308

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Generate Phase	Agent Loop	Standard agent execution producing candidate output	EAAPL-AGT001 implementation	Critical
Confidence Gate	Quality Control	Evaluates initial output confidence; gates reflection entry	Model logprobs; heuristic scoring; LLM confidence prompt	High
Critique Engine	AI Component	Generates structured critique using adversarial reviewer prompt	Separate LLM instance (same or different model); critique-tuned prompt	Critical
Quality Gate	Logic Component	Evaluates critique quality score vs. acceptance threshold; decides accept/revise/escalate	Custom logic; configurable threshold per task type	Critical
Revision Engine	AI Component	Produces targeted revision addressing specific critique issues	LLM with revision-focused prompt	Critical
Best Output Tracker	State	Tracks the highest-quality output produced across reflection cycles	In-memory; part of loop state	High
Anti-Loop Controller	Safety	Enforces maximum cycle limit; triggers fallback to best output	Counter in loop state; configurable max N	Critical
Reflection Cost Monitor	Governance	Tracks cumulative token cost of critique + revision calls; enforces cost ceiling	Custom; EAAPL-AGT010 integration	High
Reflection Memory Writer	Learning	Writes critique outcomes to episodic memory for future learning	EAAPL-AGT002 memory write API	Medium
Quality Score Time Series	Observability	Tracks quality scores per task type over time; detects drift	Metrics platform; Grafana; custom analytics	Medium

7. Data Flow

Full Reflection Cycle

Step	Actor	Action	Output
1	Task System	Submits task with quality_rubric: list of acceptance criteria, quality_threshold (e.g., 85/100)	Task + quality config
2	Generate Phase	Executes standard agent loop; produces candidate output and confidence score	Candidate: `{output_text, confidence: 0.72}`
3	Confidence Gate	Compares confidence (0.72) to auto-accept threshold (e.g., 0.90): below threshold; enter reflection	Reflection triggered
4	Critique Engine	Sends critique prompt: `[adversarial_persona] Review this draft against [quality_rubric]. Output JSON: {issues: [{category, description, severity}], quality_score: int}`	Structured critique: `{issues: [{factual_error: ...}, {missing_req: ...}], quality_score: 71}`
5	Quality Gate	Quality score 71 < acceptance threshold 85; cycle count 1 < max 3; continue	Revise
6	Revision Engine	Sends revision prompt with original output + critique issues	Revised output
7	Best Output Tracker	Revised output quality estimated; compare to prior best	Updated best candidate
8	Critique Engine (cycle 2)	Critiques revised output	Critique: `{issues: [{minor_style: ...}], quality_score: 89}`
9	Quality Gate	Score 89 ≥ threshold 85; accept	Accept
10	Output	Returns accepted output with metadata: `{output, reflection_cycles: 2, final_quality_score: 89, issues_resolved: 2}`	Final output
11	Reflection Memory Writer	Writes: task_type, initial_score, final_score, issues_resolved, cycle_count	Memory record

Error Flow

Error	Detection	Recovery
Critique engine returns malformed JSON	JSON parse error	Retry critique call with explicit JSON schema instruction; max 2 retries
Revision does not improve quality score	Quality Gate detects same or lower score	Increment cycle counter; if max reached, return best output; log plateau
Reflection cost budget exceeded	Cost Monitor	Immediately return best output with `status: reflection_budget_exceeded`
LLM provider timeout during critique	Timeout exception	Return current best output with `status: critique_timeout`

8. Security Considerations

Prompt Injection in Critique

The critique prompt injects the candidate output as content — if the candidate output contains injected instructions, the critique LLM could be manipulated
Mitigation: the critique prompt wrapper clearly delineates the content being reviewed from the critic's instructions; content is wrapped in explicit delimiters (XML tags or similar); output validation on critique output before Quality Gate evaluation

OWASP LLM Top 10

OWASP LLM Risk	Reflection Applicability	Mitigation
LLM01 Prompt Injection	Candidate output injected into critique context	Content delimiters; output validation on critique JSON
LLM09 Overreliance	Quality score could create false confidence in flawed output	Quality score is advisory metadata; high-stakes outputs always include reflection metadata for human reference; quality score ≠ accuracy guarantee
LLM08 Excessive Agency	Reflection cycles could be exploited to iteratively refine harmful outputs	Quality rubric includes safety criteria; critique is instructed to flag safety violations as terminal issues; safety-flagged outputs are rejected regardless of quality score
LLM04 DoS	Infinite reflection loops exhaust inference budget	Hard cycle limit; cost ceiling enforcement; anti-loop controller

9. Governance Considerations

Quality Rubric Governance

Quality rubrics are owned by domain subject matter experts (legal team owns legal rubrics, clinical leads own clinical rubrics)
Rubrics are versioned and change-managed; changes require impact assessment on existing task benchmarks
Acceptance thresholds are set and reviewed by the domain owner, not by engineering

Model Risk Management

Reflection quality scores are not objective ground truth; they are model judgments subject to model limitations
Quality scores must be validated against human expert assessments on a held-out benchmark before being used as primary quality gatekeepers
For highest-stakes tasks, model reflection quality scores are advisory only; human review remains the final gate

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Quality Rubric Register	Domain SME + AI Platform	Per task type; on change	Documents acceptance criteria per task type and threshold justification
Reflection Quality Benchmark	ML Engineering	Monthly	Compares model quality scores to human assessments; validates rubric effectiveness
Quality Score Distribution Report	Operations	Monthly	Per-task-type quality score distributions; identifies degradation
Reflection Cost Report	FinOps	Monthly	Average reflection cost per task type; ROI analysis vs. quality improvement

10. Operational Considerations

SLOs

SLO	Target	Window	Alert
Reflection cycle p95 latency	≤ 30s per cycle	1-hour rolling	> 60s triggers P2
Auto-accept rate (no reflection needed)	≥ 60% of tasks	24-hour rolling	< 40% indicates prompt quality issue; P3
Quality acceptance rate (within max cycles)	≥ 90%	24-hour rolling	< 80% triggers P2; quality rubric review
Average reflection cycles per accepted output	≤ 1.5	24-hour rolling	> 2.5 indicates poor initial generation

Monitoring

Quality score distribution per task type: trending toward lower initial scores indicates prompt degradation
Reflection cycle count distribution: bimodal (0 cycles or ≥2 cycles) may indicate confidence gate miscalibration
Cost per reflection cycle per task type: anomaly detection for cost spikes

11. Cost Considerations

Cost Drivers

Scenario	Additional Token Cost vs. No Reflection	Quality Benefit
60% auto-accept, 40% need 1 reflection cycle	+40% (approx)	High — issues caught in 40% of cases
60% auto-accept, 30% need 1 cycle, 10% need 2 cycles	+60% (approx)	Very High
20% auto-accept, 80% need 2 cycles	+200% (approx)	Very High but expensive — optimise generation

Optimisations

Use a smaller, faster model for the critique step and the full model only for revision (model routing)
Cache common critique patterns and their resolutions as procedural memories to reduce iteration count
Tune confidence gate threshold upward (be more selective about what triggers reflection) if auto-accept rate is too low

Indicative Cost Range (per 1,000 tasks)

Task Type	Without Reflection	With Reflection (1.5 avg cycles)	Quality Improvement
Contract clause review	$20–50	$35–85	+20–35% quality score
Clinical documentation	$15–40	$28–72	+25–40% quality score
Technical documentation	$10–30	$16–48	+15–25% quality score

12. Trade-Off Analysis

Reflection Implementation Options

Option	Quality Improvement	Cost	Complexity	Best For
A: Same-model adversarial critique (Recommended)	High	Medium	Low	Most production deployments
B: Separate critic model	Very High	High	Medium	Highest-stakes domains (legal, clinical)
C: Constitutional AI-style (self-correction via principles)	Medium–High	Medium	Low	When critique rubric is stable and articulable as principles
D: Multi-agent debate (see EAAPL-MAG005)	Very High	Very High	High	High-stakes decisions where structured debate adds unique value

Architectural Tensions

Tension	Left Pole	Right Pole	Balance
Quality vs. Latency	Maximum reflection cycles for best quality	Single pass for lowest latency	Risk-tiered: async reflection for background tasks; 1-cycle max for interactive
Critique specificity vs. Prompt complexity	Highly detailed rubric; specific critique	Simple rubric; general critique	Start with 5–10 specific criteria; iterate based on quality benchmark results
Auto-accept rate vs. Quality coverage	High threshold: most outputs go through reflection	Low threshold: rarely reflects; risk of poor quality	Tune threshold per task type to balance cost and quality

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Critique affirms rather than challenges (sycophancy)	High (model tendency)	High — reflection adds cost but not quality	Quality score does not improve across cycles	Strengthen adversarial persona in critique prompt; validate critique calibration on benchmark
Revision makes output worse (regression)	Medium	High — quality degrades	Best output tracker catches if revision score < prior best	Return prior best output; log regression; review revision prompt
Max cycles reached with sub-threshold quality	Medium	Medium — partial quality improvement	Quality warning flag in output metadata	Route to human review queue; log for rubric improvement
Reflection cost exceeds budget on complex task	Low–Medium	Medium — unexpected cost	Cost monitor	Truncate reflection; return best output; alert
Critique hallucinates non-existent issues	Medium	Medium — unnecessary revision	Validate critique against original for false positives	Human audit of critique quality on sample; rubric refinement

14. Regulatory Considerations

EU AI Act

Art. 9 (Risk Management): reflection quality scores provide evidence of quality management for high-risk AI systems; must be preserved in the task audit log
Art. 15 (Accuracy and Robustness): the reflection cycle directly implements the requirement for AI systems to remain accurate and robust; quality benchmark validation satisfies the measurement requirement

ISO 42001

§8.4: The reflection mechanism and quality monitoring are part of the AI system's operational quality management lifecycle

NIST AI RMF

MEASURE 2.5: The quality score time series and benchmark validation implement the AI performance measurement requirement

15. Reference Implementations

AWS

Component	Service
Generate + Critique + Revise	Amazon Bedrock (Claude 3 Sonnet for generation; Claude 3 Haiku for critique)
Confidence Gate	Custom Lambda function evaluating model response metadata
Quality Score Tracking	Amazon CloudWatch custom metrics

Azure

Component	Service
Generate + Critique + Revise	Azure OpenAI Service (GPT-4o for generation; GPT-4o-mini for critique)
Reflection Orchestration	Azure Durable Functions (sub-orchestration for reflection sub-loop)

On-Premises

Component	Technology
Generate + Critique + Revise	vLLM serving Llama 3.1 70B (generation); Llama 3.1 8B (critique)
Reflection Orchestration	LangGraph with custom reflection node

Pattern	ID	Relationship Type	Notes
Single Agent Pattern	EAAPL-AGT001	Extends	Reflection sub-loop extends the Reflect phase of the base agent loop
Stateful Agent Memory	EAAPL-AGT002	Integrates With	Critique outcomes are written to episodic memory for learning
Agent Cost Governance	EAAPL-AGT010	Integrates With	Reflection cost is tracked and controlled under the cost governance pattern
Debate Agent	EAAPL-MAG005	Related	Debate is an alternative quality mechanism using multiple agents; this pattern uses self-critique
Human-in-the-Loop Agent	EAAPL-MAG003	Peer	Outputs that fail reflection after max cycles are escalated to human review

17. Maturity Assessment

Overall Maturity: Emerging

Dimension	Score (1–5)	Evidence
Research Foundation	5	Constitutional AI, Self-Refine, Reflexion papers provide strong academic foundation
Production Deployment	3	Deployed in specialised high-stakes applications; general production tooling still maturing
Quality Measurement	3	Quality benchmark methodology developing; no standard evaluation framework yet
Cost Optimisation	3	Model routing for critique maturing; confidence gate calibration still domain-specific
Framework Support	3	LangGraph supports reflection nodes; general framework support growing

18. Revision History

Version	Date	Author	Changes
1.0	2024-07-01	Architecture Board	Initial publication
1.1	2025-02-15	ML Engineering	Added model routing for critique; anti-loop cost ceiling; quality benchmark methodology

← Back to Library More Agentic AI →