Proven

EAAPL-OBS007 · Distributed AI Tracing

Pattern ID: EAAPL-OBS007 Status: Proven Complexity: High Tags: observability traceability agent llm high-complexity Version: 1.0.0 Last Reviewed: 2026-06-12

1. Executive Summary

AI pipelines are not single API calls. They are chains of components — authentication, rate limiting, prompt assembly, vector retrieval, LLM invocation, output filtering, downstream tool calls, and often recursive agent sub-invocations — that collectively produce a response. When something goes wrong (latency spike, unexpected output, security event), engineers need to answer: which component in the chain caused the problem, how much did each step contribute to total latency, and what was the exact execution path for this specific request?

This pattern defines end-to-end distributed tracing through multi-component AI pipelines, extended to cover agentic architectures where agent-to-agent handoffs and recursive tool invocations create deep, branching execution trees. It covers W3C Trace Context propagation through every pipeline component; span enrichment with AI-specific metadata (model name, version, token counts, cache hit, tool name, cost contribution); agent chain tracing across handoffs and sub-agent invocations; OpenTelemetry GenAI semantic conventions; latency waterfall visualisation showing per-component contribution; and sampling strategies that balance cost with debugging fidelity. The outcome is the ability to open any AI request from the last 30 days and see exactly what happened, when, in what order, at what cost, and why.

Target Audience: CTO, Head of Platform Engineering, AI Engineering Lead, Security Engineering Lead Time to Implement: 6–10 weeks

2. Problem Statement

Business Problem

Without distributed tracing, AI pipeline debugging is a process of log archaeology. Engineers correlate log timestamps across multiple services, guess at the execution path, and often cannot reconstruct what happened for a specific user request. This extends incident resolution from minutes to hours. For agentic AI systems — where an AI may invoke tools, call sub-agents, and execute multi-step workflows — the complexity of manual reconstruction is prohibitive.

Technical Problem

Standard HTTP tracing tools trace network hops but do not capture the semantic context of AI operations: which model was called, with how many tokens, whether the cache was hit, what tools were invoked, and how token costs accumulated across the pipeline. OpenTelemetry's GenAI semantic conventions (introduced in 2024) provide a standard, but they are not yet natively supported by all AI SDKs and require instrumentation investment.

Symptoms

P1 AI latency incident investigation takes 4+ hours because engineers are correlating logs across 6 services
Agent recursive invocations are invisible in traces — the trace shows one LLM call when there were actually 12
Token cost cannot be attributed to individual pipeline steps; the total cost is known but not its distribution
A security incident (prompt injection) cannot be reconstructed because the trace did not capture the prompt assembly step
Cache effectiveness is unknown; traces do not indicate whether a response was served from cache or generated fresh

Cost of Inaction

Extended MTTR on AI latency and quality incidents due to insufficient trace fidelity
Security incidents without complete execution traces are unresolvable and may generate regulatory findings
Agentic AI systems deployed in production without visibility are a governance liability
Engineers avoid making performance optimisations because they cannot identify the bottleneck with confidence

3. Context

When to Apply

Any AI pipeline with more than two components in the critical path
Agentic AI systems with tool invocations, sub-agent calls, or multi-step reasoning
Systems where latency SLOs are defined and must be debuggable
Systems subject to security incident response obligations requiring complete execution reconstruction
Prerequisite: EAAPL-OBS001 provides the collector and backend infrastructure

When NOT to Apply

Simple single-call LLM integrations with no downstream components (basic telemetry from OBS001 is sufficient)
Proof-of-concept systems where trace infrastructure overhead exceeds debugging value

Prerequisites

Prerequisite	Required	Notes
EAAPL-OBS001 Trace Backend	Required	Jaeger/Tempo/X-Ray backend required
OpenTelemetry SDK in all pipeline components	Required	Trace context propagation requires SDK in each service
W3C Trace Context header support in API gateway	Required	Root span must be created at entry point
Secrets management	Required	API keys and tokens must not appear in span attributes

Industry Applicability

Industry	Applicability	Primary Driver
Financial Services	Critical	Audit reconstruction, security incident response, latency SLOs
Healthcare	Critical	Clinical AI decision reconstruction, safety incident investigation
Technology / SaaS	Critical	Agent pipeline debugging, multi-tenant trace isolation
Government	High	Audit obligations, AI decision reconstruction
Legal Services	High	Professional liability incident reconstruction
Retail / E-Commerce	Medium	Latency optimisation for recommendation pipelines

4. Architecture Overview

The Distributed AI Tracing Architecture implements complete, semantically rich trace context propagation through every component in an AI pipeline, with special extensions for agentic architectures.

Trace Context Propagation

Every request entering the AI system creates a root span at the API gateway. The root span carries a W3C Trace Context header (traceparent and tracestate). This header propagates downstream via HTTP headers, gRPC metadata, or message queue message attributes. Every component that receives the request reads the incoming trace context, creates a child span, performs its work, and closes the child span before forwarding. The result is a complete trace tree with parent-child relationships showing the exact execution path.

Span Inventory for Standard AI Pipelines

Eight standard spans are defined for a typical RAG AI pipeline. api_gateway span: records method, path, status code, client IP (hashed), user agent. authentication span: records auth method, identity type (user/service), auth result. rate_limit span: records rate limit decision, remaining quota, policy applied. prompt_assembly span: records template ID, template version, assembly time, total assembled token count estimate. vector_retrieval span: one span per retrieval call, recording collection name, query vector dimensions, top-k value, retrieved count, retrieval latency, similarity score range. llm_api span: the most important span, recording gen_ai.system (e.g., openai, anthropic), gen_ai.request.model, gen_ai.request.max_tokens, gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.response.finish_reason, cache_hit (boolean), latency, costUsd. output_filter span: records filter policy applied, filter result, filter reason if triggered. response_serialisation span: records output format, response size in bytes.

AI Semantic Conventions

OpenTelemetry GenAI semantic conventions are applied to all AI spans. Key attributes: gen_ai.system (LLM provider), gen_ai.request.model (model identifier), gen_ai.request.max_tokens, gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.response.finish_reason, gen_ai.operation.name (chat, embeddings, fine-tuning). Custom extensions defined for enterprise use: gen_ai.prompt.template_id, gen_ai.prompt.template_version, gen_ai.prompt.cache_hit, gen_ai.cost.usd, gen_ai.retrieval.collection, gen_ai.retrieval.top_k.

Agent Chain Tracing

Agentic architectures present the greatest tracing challenge. An agent receives a task, reasons about it, decides to call a tool, receives the tool result, reasons again, and may call another agent or tool recursively. The trace must capture the full execution tree. The agent tracing model uses a parent span for each agent invocation. When the agent decides to call a tool, a child span is created under the agent span for the tool call. When the agent calls a sub-agent (multi-agent architectures), the trace context is propagated to the sub-agent as a baggage item; the sub-agent creates its own span tree as a child of the calling agent's span. The result is a complete execution tree that shows: root agent → tool calls → sub-agent invocations → each sub-agent's spans → their tool calls. The depth of this tree is bounded by a configurable max agent nesting depth (default: 10) to prevent runaway agents from creating unbounded trace trees.

Latency Waterfall and Cost Attribution

The trace backend renders traces as a latency waterfall showing each component's contribution as a percentage of total latency. This visualisation immediately answers: "which component is responsible for 80% of the latency?" Cost attribution is added as a span annotation: each span records its cost contribution (the LLM API span records the token cost; tool call spans record any API costs). The total request cost is the sum of all span cost contributions, and the waterfall view shows cost distribution alongside latency distribution.

Sampling Strategy

A three-tier sampling strategy balances cost with debugging fidelity. 100% sampling for all error traces (any span with an error status is always captured at full fidelity). 10% head-based sampling for normal successful traces on standard-volume paths. 1% sampling for very high-volume paths (> 100 requests/second on a single endpoint). Tail-based sampling is applied as a second pass: any trace with a total latency exceeding the p95 threshold is upsampled to 100% regardless of head sampling decision. This ensures slow outliers are always captured for performance optimisation.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Pipeline["AI Request Pipeline"] A[API Gateway Root Span] B[Prompt Assembly] C[Vector Retrieval] D[LLM API Client] end subgraph Agent["Agent Extensions"] E[Tool Call Span] F[Sub-Agent Span] end subgraph Collection["Trace Collection"] G[OTel Collector] H[(Trace Backend)] end A -->|child span| B B -->|child span| C B -->|child span| D D -->|tool call| E D -->|sub-agent| F A --> G B --> G C --> G D --> G E --> G F --> G G --> H H --> I[Latency Waterfall UI] style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#dbeafe,stroke:#3b82f6 style F fill:#dbeafe,stroke:#3b82f6 style G fill:#f0fdf4,stroke:#22c55e style H fill:#fef9c3,stroke:#eab308 style I fill:#d1fae5,stroke:#10b981

6. Components

Component	Type	Responsibility	Technology Options	Criticality
API Gateway Tracer	Middleware	Create root span; inject traceparent; record gateway-level attributes	Kong OpenTelemetry plugin; custom Nginx module; AWS API Gateway OpenTelemetry	Critical
AI Client Tracer	SDK Library	Create llm_api child span; record all GenAI semantic convention attributes	Custom wrapper on OpenAI/Anthropic/Bedrock SDK; Langchain OpenTelemetry integration	Critical
Vector Store Tracer	SDK Library	Create vector_retrieval child span; record collection, query, result count	Custom wrapper on pgvector/Weaviate/Pinecone/Qdrant client	High
Agent Tracer	SDK Library	Create agent invocation spans; propagate trace context to sub-agents; record agent type, tool calls	LangChain/LlamaIndex OpenTelemetry callbacks; custom agent framework instrumentation	Critical for agent systems
Tool Call Tracer	SDK Library	Create tool_call child span; record tool name, input schema, latency, output summary	Instrumented tool wrapper	High for agent systems
OTel Collector	Infrastructure	Receive OTLP; apply AI-specific processors; fan out to trace backend	OTel Collector Contrib; AWS ADOT; Grafana Alloy	Critical
Trace Backend	Storage	Store, index, and serve distributed traces; support waterfall UI	Jaeger, Grafana Tempo, AWS X-Ray, Google Cloud Trace	Critical
Trace UI	Consumption	Visualise trace waterfall; search by trace ID, model, error status	Jaeger UI, Grafana Tempo UI, AWS X-Ray Console	High
Sampling Controller	Collector Processor	Apply head-based + tail-based sampling strategy	OTel tail-based sampling processor; Jaeger agent sampler	Medium
Span Enrichment Processor	Collector Processor	Add cost contribution, environment tags, service version to all spans	OTel transform processor; custom processor	Medium

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	API Gateway	Receives client request; creates root span with trace ID; injects W3C traceparent header	Root span started; traceparent header on all downstream calls
2	Authentication Service	Reads traceparent; creates child span; records auth result; forwards traceparent	auth span completed
3	Rate Limiter	Reads traceparent; creates child span; records limit decision; forwards	rate_limit span completed
4	Prompt Assembly	Reads traceparent; creates child span; records template ID/version, token count estimate	prompt_assembly span completed
5	Vector Store Client	Creates child span for retrieval call; records collection, top-k, latency, score range	vector_retrieval span completed
6	LLM API Client	Creates child span; sends request to model; records model, tokens, cache_hit, cost, finish_reason	llm_api span completed with full GenAI attributes
7	Tool (if agent pattern)	Receives trace context via baggage; creates child span under llm_api span; executes tool; records result	tool_call span completed
8	Output Filter	Creates child span; records filter decision; closes span	output_filter span completed
9	API Gateway	Closes root span; all child spans already closed; full trace tree assembled	Complete trace exported to OTel Collector
10	OTel Collector	Receives trace; applies enrichment and sampling; exports to trace backend	Trace stored in backend; queryable by trace ID

Error Flow

Error Scenario	Detection	Action	Recovery
traceparent header not propagated by a service	Child spans appear as orphaned roots in trace UI	Alert to engineering; service requires instrumentation fix	Fix instrumentation in non-propagating service; verify in staging
Agent nesting depth exceeded	Max depth counter in agent tracer	Close trace tree at max depth; log warning; continue execution	Investigate agent runaway loop; fix agent logic
Trace backend unavailable	OTel Collector export failure; error metric	Buffer in memory (limited); drop oldest; alert	Restore trace backend; accept trace loss for outage duration
Sensitive data in span attribute	PII scrubber processor detects PII in span attribute	Redact attribute value; log PII detection event	Fix instrumentation to not include PII in span attributes
Trace size too large (deep agent chain)	Trace size metric exceeds limit	Truncate oldest spans; log truncation event	Increase trace size limit or reduce agent recursion depth

8. Security Considerations

Authentication: Trace backend access restricted via SSO with RBAC. OTel Collector → trace backend connection uses mTLS or service account. Trace search API requires authentication.

Authorisation: Traces containing sensitive AI outputs (regulated domains) have restricted access. Standard traces accessible to engineers. Security incident traces restricted to security team and incident responders.

Secrets Management: Model API keys must never appear in span attributes. The AI Client Tracer is responsible for ensuring keys are stripped from all recorded request metadata. Regular span audit for credential leakage.

Data Classification: Trace data is classified at least as Internal. Traces containing AI outputs from regulated domains (financial advice, clinical) classified as Confidential. Security incident traces classified as Restricted.

Encryption: Trace data encrypted in transit (TLS 1.3) and at rest (AES-256). Long-term trace archives use envelope encryption.

Auditability: Every access to trace data is logged. Security incident traces have additional access audit controls. Trace data access may be required for legal discovery.

OWASP LLM Top 10 Coverage

OWASP LLM Risk	Distributed Tracing Control	Implementation
LLM01 Prompt Injection	Complete trace enables reconstruction of injection attack execution path	prompt_assembly span + llm_api span together show exact injection execution
LLM02 Insecure Output Handling	output_filter span records filter decisions; gaps in filtering visible	Missing filter span on an output is an architectural gap alert
LLM03 Training Data Poisoning	Training data pipeline traces detect unexpected data sources	Pipeline tracing extends to training data if applicable
LLM04 Model Denial of Service	Deep agent traces reveal recursive invocation patterns driving DoS	Agent nesting depth metric; max depth enforcement
LLM05 Supply Chain Vulnerabilities	gen_ai.request.model in every trace; unexpected model visible immediately	Model audit from trace data detects supply chain substitution
LLM06 Sensitive Information Disclosure	PII scrubber on span attributes; prompt content not in traces by default	Span attribute PII scanning parallel to log PII scanning
LLM07 Insecure Plugin Design	tool_call spans record every tool invoked, its inputs, and outputs	Complete tool invocation audit trail in traces
LLM08 Excessive Agency	Agent nesting depth and breadth visible in trace tree	Alert on traces exceeding agent_depth > 5
LLM09 Overreliance	Trace + outcome correlation enables overreliance detection	Traces linked to hallucination events (OBS003) surface overreliance patterns
LLM10 Model Theft	Bulk extraction attempts produce distinctive trace patterns	Anomalous trace patterns (many short calls with full context) = theft signal

9. Governance Considerations

Responsible AI: Distributed traces are the audit artefact for AI systems. They enable: reconstruction of any specific AI decision for compliance review; identification of which version of which model produced a specific output; evidence that output filtering was applied. For high-risk AI systems under EU AI Act Article 14, traces demonstrate that human oversight is technically possible.

Model Risk Management: Traces provide the performance data required for model risk management review. Latency waterfall analysis supports capacity management for material models. Unexpected model version changes are detectable from trace data.

Human Approval: Trace data access for security investigations requires CISO or incident commander authorisation for restricted-classification traces.

Policy: Trace retention policy must balance debugging and audit needs against storage cost. Minimum: 30 days hot (full fidelity); 90 days warm (sampled); 7 years cold (compliance-required traces for high-risk AI decisions).

Traceability: The trace ID is the primary key for AI decision reconstruction. Every AI-influenced decision should have a trace ID logged in the application database so that the decision can be reconstructed from the trace. This is the technical foundation for the GDPR/Privacy Act right to explanation.

Governance Artefacts

Artefact	Owner	Frequency	Format
Trace Sampling Configuration	Platform Engineering	Per change + quarterly review	Version-controlled config
Span Attribute Security Audit	Security	Quarterly	Automated scan for credentials + PII in span attributes
Trace Retention Policy	Legal / Compliance	Annual	Policy document with retention tier definitions
Agent Depth SLO Review	AI Engineering	Monthly	Dashboard review; adjust max depth if needed
Trace Search Audit Log	Security	Monthly	Access log export; review for anomalous patterns

10. Operational Considerations

Monitoring: Trace collection rate (traces per second), trace backend ingestion lag, trace size distribution, and sampling effectiveness metrics are all monitored. The OTel Collector export error rate is a critical metric.

Logging: Collector and trace backend operational logs stored separately from the traces they handle.

Incident Response: During AI incidents, traces are the primary diagnostic tool. Runbooks reference specific trace search queries for each incident type: search for traces with error spans in the last 30 minutes; search for traces with llm_api spans for a specific model version; search for traces with unusual agent_depth.

Disaster Recovery: Trace data loss during a short outage is acceptable for operational debugging purposes. Compliance-required traces (for high-risk AI decisions) must be durably stored with RPO < 1 hour.

Capacity Planning: Trace storage scales with the number of spans per request. An agentic workflow with 10 agent steps, each with 5 tool calls, may produce 100+ spans per request. At 10K requests/day, this is 1M+ spans/day. Columnar compression and aggressive sampling are required at scale.

SLO Table

SLO	Target	Measurement	Alert Threshold
Trace delivery lag	< 60 seconds from request completion to trace queryable	Collector delivery lag	> 5 minutes
Trace completeness	> 99% of sampled requests have complete trace trees	Orphaned span count / total span count	> 1% orphaned spans
Trace backend query latency	< 2 seconds for trace lookup by ID	Backend query p95	> 5 seconds
Trace storage availability	> 99.9%	Trace backend health check	< 99.5% for 10 minutes

Disaster Recovery Table

Component	RTO	RPO	Recovery Approach
OTel Collector	5 minutes	Near-zero (active-active)	Auto-failover; in-memory buffer during failover
Trace Backend	30 minutes	4 hours (non-compliance traces) / 1 hour (compliance)	Replicated backend; restore from object storage archive
Trace UI	30 minutes	N/A (read-only)	Redeploy; underlying data intact

11. Cost Considerations

Cost Drivers

Driver	Description	Relative Cost
Trace backend storage	Spans with AI metadata are 3–5KB each; scales with span count	High at scale
Tail-based sampling compute	Keeping 100% of error + slow traces requires in-memory buffering	Medium
Agent chain trace depth	Deep agentic traces generate 100x more spans than simple calls	Very High for agent workloads
Trace query compute	Interactive trace queries on large trace volumes require significant compute	Medium

Scaling Risks: Agent workloads with deep recursion can generate extremely large traces. Max depth enforcement and per-trace size limits are essential for cost control.

Optimisations:

1% sampling for high-volume standard paths (see sampling strategy)
Compress trace data before storage; gzip achieves 70% reduction on JSON trace data
Archive traces older than 30 days to object storage (S3/GCS); query via Athena/BigQuery if needed

Indicative Cost Range

Scale	AI Requests/Day	Estimated Distributed Tracing Cost/Month
Small	10,000	$200–$500
Medium	500,000	$2,000–$6,000
Large	5,000,000	$8,000–$25,000
Enterprise (agent-heavy)	1,000,000 + deep agent chains	$20,000–$80,000

12. Trade-Off Analysis

Approach Comparison

Approach	Pros	Cons	Best For
Full OpenTelemetry with GenAI semantic conventions	Standard; portable; rich AI metadata; works across all components	Instrumentation effort; agent tracing requires custom work; storage cost	Any production AI system; especially multi-component and agentic
Vendor-specific tracing (e.g., LangSmith, Arize AI)	Zero instrumentation for supported frameworks; rich UI	Vendor lock-in; limited to supported frameworks; data leaves organisation	LangChain-based systems; organisations without platform engineering
Log correlation only (no distributed traces)	Zero infrastructure; uses existing log store	Cannot visualise execution tree; manual correlation is slow; insufficient for agentic	Legacy systems where trace infrastructure cannot be added

Architectural Tensions

Tension	Description	Resolution
Completeness vs. Cost	100% sampling of all traces is ideal but prohibitively expensive at scale	Tiered sampling: 100% errors and slow traces; 10% standard; 1% high-volume
Span richness vs. PII risk	Rich span attributes enable debugging but may capture PII	Default: no prompt/output content in spans; metadata only; content only in logs with PII scrubbing
Agent observability vs. Complexity	Agent chains create complex trace trees; too deep to read easily	Max depth enforcement; summary span at each agent level; UI filtering by depth
Standard vs. Custom attributes	Standard GenAI conventions ensure tooling compatibility; custom attributes add value	Use standard as baseline; extend with gen_ai.* custom attributes; never deviate from standard for covered fields

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
traceparent not propagated through a new service	Medium	High (orphaned spans; broken traces)	Orphaned span rate alert	Instrumentation fix in new service; staging trace validation gate
Agent max depth exceeded silently	Medium	Medium (incomplete trace for deep agents)	max_depth_exceeded counter	Alert engineers; investigate agent logic; increase limit if legitimate
Trace backend storage full	Low	High (traces dropped; debugging impossible)	Storage utilisation alert at 80%	Increase storage; reduce retention; emergency sampling reduction
Span attribute contains credentials	Low	Critical (security breach in trace data)	Automated span attribute scan	Immediate trace data quarantine; rotate credentials; fix instrumentation
High cardinality span attributes cause index explosion	Medium	High (trace backend OOM / slow queries)	Backend memory and query latency alerts	Remove high-cardinality attribute from span; aggregate to lower cardinality

Cascading Scenarios

Scenario 1: New agent framework deployed without trace instrumentation → agent calls invisible in traces → production incident requires reconstructing agent execution → no trace data → extended MTTR → regulatory finding for insufficient AI audit trail. Mitigation: trace completeness gate in deployment pipeline; staging validation of trace fidelity.
Scenario 2: Agent recursion bug creates 1000-depth agent chain → each chain generates 5000 spans → trace backend OOM → all traces lost → blind flight during incident. Mitigation: max_depth enforcement at 10; per-trace size limit; circuit breaker on span emission.

14. Regulatory Considerations

Regulation	Clause	Requirement	Distributed Tracing Implementation
EU AI Act	Article 12.1 (Logging)	High-risk AI systems must log inputs and outputs to enable post-hoc verification	Complete traces with all component spans provide post-hoc verification capability
EU AI Act	Article 14 (Human Oversight)	Technical measures must allow human oversight and intervention	Trace data enables humans to review and intervene in AI decision processes
APRA CPS 234	Para 36 (Incident Response)	Security incidents must be investigated; evidence must be preserved	Security incident traces (with 7-year retention for high-risk) are the investigation evidence
Privacy Act 1988 (AU)	APP 12 (Access to Information)	Individuals have right to access information used in decisions about them	Trace data (linked to decision record) enables reconstruction of decision inputs
ISO/IEC 42001	Clause 9.1 (Monitoring)	AI system operation must be monitored and documented	Distributed traces implement the monitoring documentation requirement
NIST AI RMF	GOVERN 4.2	AI system accountability requires logging enabling attribution of decisions	Trace IDs linked to decisions provide the accountability mechanism

15. Reference Implementations

AWS

Instrumentation: AWS Distro for OpenTelemetry (ADOT) SDK; custom AI Client Tracer
Collector: ADOT Collector on ECS/EKS; sidecar pattern
Trace Backend: AWS X-Ray; or Grafana Tempo on EKS for higher fidelity
Agent Tracing: Custom LangChain callback handler emitting OTLP to ADOT collector
Trace UI: AWS X-Ray Service Map; X-Ray Trace Search
Sampling: X-Ray reservoir-based sampling; ADOT tail-based sampling processor

Azure

Instrumentation: Azure Monitor OpenTelemetry Distro; custom AI Client Tracer
Collector: Azure Monitor OpenTelemetry Distro (built-in collector)
Trace Backend: Azure Application Insights (built-in trace storage)
Agent Tracing: Semantic Kernel built-in OpenTelemetry; custom callback handler
Trace UI: Application Insights Transaction Search; Azure Monitor Workbooks
Sampling: Application Insights adaptive sampling (manages cost automatically)

GCP

Instrumentation: Google Cloud OpenTelemetry SDK; custom AI Client Tracer
Collector: OpenTelemetry Collector on GKE with Cloud Trace exporter
Trace Backend: Google Cloud Trace
Agent Tracing: Vertex AI Extensions with Cloud Trace propagation
Trace UI: Cloud Trace Timeline UI; Cloud Logging correlated views
Sampling: Cloud Trace automatic sampling; custom OTLP collector sampling

On-Premises

Instrumentation: OpenTelemetry SDK (language-native); custom AI Client Tracer
Collector: OpenTelemetry Collector Contrib (self-hosted on Kubernetes)
Trace Backend: Grafana Tempo (open source, highly scalable, object storage backend)
Agent Tracing: LangChain/LlamaIndex custom OTLP callback; custom agent framework instrumentation
Trace UI: Grafana Tempo UI integrated in Grafana
Sampling: OTel tail-based sampling processor (supports all sampling strategies)

Pattern ID	Pattern Name	Relationship	Notes
EAAPL-OBS001	AI Telemetry Architecture	Foundation	Collector and backend infrastructure shared; trace signals one of three from OBS001
EAAPL-OBS002	Prompt Monitoring	Extends	Prompt metadata in logs; trace context links log records to spans
EAAPL-OBS003	Hallucination Detection	Depends On	Detection events linked to traces via requestId; trace shows execution context of hallucination
EAAPL-OBS004	AI Incident Management	Depends On	Trace search is primary diagnostic tool in incident runbooks
EAAPL-OBS008	AI Performance Benchmarking	Sibling	Benchmark latency vs production latency from traces

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Rationale
Adoption Breadth	3	Distributed tracing mainstream in microservices; AI-specific extensions still early majority
Tooling Ecosystem	4	OTel GenAI conventions released 2024; tooling maturing rapidly; vendor support improving
Operational Runbook Coverage	3	Generic trace debugging runbooks mature; AI-specific agent tracing runbooks custom
Regulatory Evidence	3	EU AI Act Article 12 logging requirement drives adoption; pattern is the recommended implementation
Cost Predictability	3	Agent workload trace costs can surprise teams; sampling strategy discipline required
Team Skill Availability	4	OpenTelemetry skills broadly available; GenAI conventions require AI-specific training

18. Revision History

Version	Date	Author	Changes
1.0.0	2026-06-12	EAAPL Working Group	Initial publication

← Back to Library More Observability & Monitoring →

EAAPL-OBS007 · Distributed AI Tracing

EAAPL-OBS007 · Distributed AI Tracing

1. Executive Summary

2. Problem Statement

Business Problem

Technical Problem

Symptoms

Cost of Inaction

3. Context

When to Apply

When NOT to Apply

Prerequisites

Industry Applicability

4. Architecture Overview

5. Architecture Diagram

6. Components

7. Data Flow

Primary Flow

Error Flow

8. Security Considerations

OWASP LLM Top 10 Coverage

9. Governance Considerations

Governance Artefacts

10. Operational Considerations

SLO Table

Disaster Recovery Table

11. Cost Considerations

Indicative Cost Range

12. Trade-Off Analysis

Approach Comparison

Architectural Tensions

13. Failure Modes

Cascading Scenarios

14. Regulatory Considerations

15. Reference Implementations

AWS

Azure

GCP

On-Premises

16. Related Patterns

17. Maturity Assessment

18. Revision History