EAAPL-OBS007 · Distributed AI Tracing
Pattern ID: EAAPL-OBS007
Status: Proven
Complexity: High
Tags: observability traceability agent llm high-complexity
Version: 1.0.0
Last Reviewed: 2026-06-12
1. Executive Summary
AI pipelines are not single API calls. They are chains of components — authentication, rate limiting, prompt assembly, vector retrieval, LLM invocation, output filtering, downstream tool calls, and often recursive agent sub-invocations — that collectively produce a response. When something goes wrong (latency spike, unexpected output, security event), engineers need to answer: which component in the chain caused the problem, how much did each step contribute to total latency, and what was the exact execution path for this specific request?
This pattern defines end-to-end distributed tracing through multi-component AI pipelines, extended to cover agentic architectures where agent-to-agent handoffs and recursive tool invocations create deep, branching execution trees. It covers W3C Trace Context propagation through every pipeline component; span enrichment with AI-specific metadata (model name, version, token counts, cache hit, tool name, cost contribution); agent chain tracing across handoffs and sub-agent invocations; OpenTelemetry GenAI semantic conventions; latency waterfall visualisation showing per-component contribution; and sampling strategies that balance cost with debugging fidelity. The outcome is the ability to open any AI request from the last 30 days and see exactly what happened, when, in what order, at what cost, and why.
Target Audience: CTO, Head of Platform Engineering, AI Engineering Lead, Security Engineering Lead Time to Implement: 6–10 weeks
2. Problem Statement
Business Problem
Without distributed tracing, AI pipeline debugging is a process of log archaeology. Engineers correlate log timestamps across multiple services, guess at the execution path, and often cannot reconstruct what happened for a specific user request. This extends incident resolution from minutes to hours. For agentic AI systems — where an AI may invoke tools, call sub-agents, and execute multi-step workflows — the complexity of manual reconstruction is prohibitive.
Technical Problem
Standard HTTP tracing tools trace network hops but do not capture the semantic context of AI operations: which model was called, with how many tokens, whether the cache was hit, what tools were invoked, and how token costs accumulated across the pipeline. OpenTelemetry's GenAI semantic conventions (introduced in 2024) provide a standard, but they are not yet natively supported by all AI SDKs and require instrumentation investment.
Symptoms
- P1 AI latency incident investigation takes 4+ hours because engineers are correlating logs across 6 services
- Agent recursive invocations are invisible in traces — the trace shows one LLM call when there were actually 12
- Token cost cannot be attributed to individual pipeline steps; the total cost is known but not its distribution
- A security incident (prompt injection) cannot be reconstructed because the trace did not capture the prompt assembly step
- Cache effectiveness is unknown; traces do not indicate whether a response was served from cache or generated fresh
Cost of Inaction
- Extended MTTR on AI latency and quality incidents due to insufficient trace fidelity
- Security incidents without complete execution traces are unresolvable and may generate regulatory findings
- Agentic AI systems deployed in production without visibility are a governance liability
- Engineers avoid making performance optimisations because they cannot identify the bottleneck with confidence
3. Context
When to Apply
- Any AI pipeline with more than two components in the critical path
- Agentic AI systems with tool invocations, sub-agent calls, or multi-step reasoning
- Systems where latency SLOs are defined and must be debuggable
- Systems subject to security incident response obligations requiring complete execution reconstruction
- Prerequisite: EAAPL-OBS001 provides the collector and backend infrastructure
When NOT to Apply
- Simple single-call LLM integrations with no downstream components (basic telemetry from OBS001 is sufficient)
- Proof-of-concept systems where trace infrastructure overhead exceeds debugging value
Prerequisites
| Prerequisite | Required | Notes |
|---|---|---|
| EAAPL-OBS001 Trace Backend | Required | Jaeger/Tempo/X-Ray backend required |
| OpenTelemetry SDK in all pipeline components | Required | Trace context propagation requires SDK in each service |
| W3C Trace Context header support in API gateway | Required | Root span must be created at entry point |
| Secrets management | Required | API keys and tokens must not appear in span attributes |
Industry Applicability
| Industry | Applicability | Primary Driver |
|---|---|---|
| Financial Services | Critical | Audit reconstruction, security incident response, latency SLOs |
| Healthcare | Critical | Clinical AI decision reconstruction, safety incident investigation |
| Technology / SaaS | Critical | Agent pipeline debugging, multi-tenant trace isolation |
| Government | High | Audit obligations, AI decision reconstruction |
| Legal Services | High | Professional liability incident reconstruction |
| Retail / E-Commerce | Medium | Latency optimisation for recommendation pipelines |
4. Architecture Overview
The Distributed AI Tracing Architecture implements complete, semantically rich trace context propagation through every component in an AI pipeline, with special extensions for agentic architectures.
Trace Context Propagation
Every request entering the AI system creates a root span at the API gateway. The root span carries a W3C Trace Context header (traceparent and tracestate). This header propagates downstream via HTTP headers, gRPC metadata, or message queue message attributes. Every component that receives the request reads the incoming trace context, creates a child span, performs its work, and closes the child span before forwarding. The result is a complete trace tree with parent-child relationships showing the exact execution path.
Span Inventory for Standard AI Pipelines
Eight standard spans are defined for a typical RAG AI pipeline. api_gateway span: records method, path, status code, client IP (hashed), user agent. authentication span: records auth method, identity type (user/service), auth result. rate_limit span: records rate limit decision, remaining quota, policy applied. prompt_assembly span: records template ID, template version, assembly time, total assembled token count estimate. vector_retrieval span: one span per retrieval call, recording collection name, query vector dimensions, top-k value, retrieved count, retrieval latency, similarity score range. llm_api span: the most important span, recording gen_ai.system (e.g., openai, anthropic), gen_ai.request.model, gen_ai.request.max_tokens, gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.response.finish_reason, cache_hit (boolean), latency, costUsd. output_filter span: records filter policy applied, filter result, filter reason if triggered. response_serialisation span: records output format, response size in bytes.
AI Semantic Conventions
OpenTelemetry GenAI semantic conventions are applied to all AI spans. Key attributes: gen_ai.system (LLM provider), gen_ai.request.model (model identifier), gen_ai.request.max_tokens, gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.response.finish_reason, gen_ai.operation.name (chat, embeddings, fine-tuning). Custom extensions defined for enterprise use: gen_ai.prompt.template_id, gen_ai.prompt.template_version, gen_ai.prompt.cache_hit, gen_ai.cost.usd, gen_ai.retrieval.collection, gen_ai.retrieval.top_k.
Agent Chain Tracing
Agentic architectures present the greatest tracing challenge. An agent receives a task, reasons about it, decides to call a tool, receives the tool result, reasons again, and may call another agent or tool recursively. The trace must capture the full execution tree. The agent tracing model uses a parent span for each agent invocation. When the agent decides to call a tool, a child span is created under the agent span for the tool call. When the agent calls a sub-agent (multi-agent architectures), the trace context is propagated to the sub-agent as a baggage item; the sub-agent creates its own span tree as a child of the calling agent's span. The result is a complete execution tree that shows: root agent → tool calls → sub-agent invocations → each sub-agent's spans → their tool calls. The depth of this tree is bounded by a configurable max agent nesting depth (default: 10) to prevent runaway agents from creating unbounded trace trees.
Latency Waterfall and Cost Attribution
The trace backend renders traces as a latency waterfall showing each component's contribution as a percentage of total latency. This visualisation immediately answers: "which component is responsible for 80% of the latency?" Cost attribution is added as a span annotation: each span records its cost contribution (the LLM API span records the token cost; tool call spans record any API costs). The total request cost is the sum of all span cost contributions, and the waterfall view shows cost distribution alongside latency distribution.
Sampling Strategy
A three-tier sampling strategy balances cost with debugging fidelity. 100% sampling for all error traces (any span with an error status is always captured at full fidelity). 10% head-based sampling for normal successful traces on standard-volume paths. 1% sampling for very high-volume paths (> 100 requests/second on a single endpoint). Tail-based sampling is applied as a second pass: any trace with a total latency exceeding the p95 threshold is upsampled to 100% regardless of head sampling decision. This ensures slow outliers are always captured for performance optimisation.
5. Architecture Diagram
6. Components
| Component | Type | Responsibility | Technology Options | Criticality |
|---|---|---|---|---|
| API Gateway Tracer | Middleware | Create root span; inject traceparent; record gateway-level attributes | Kong OpenTelemetry plugin; custom Nginx module; AWS API Gateway OpenTelemetry | Critical |
| AI Client Tracer | SDK Library | Create llm_api child span; record all GenAI semantic convention attributes | Custom wrapper on OpenAI/Anthropic/Bedrock SDK; Langchain OpenTelemetry integration | Critical |
| Vector Store Tracer | SDK Library | Create vector_retrieval child span; record collection, query, result count | Custom wrapper on pgvector/Weaviate/Pinecone/Qdrant client | High |
| Agent Tracer | SDK Library | Create agent invocation spans; propagate trace context to sub-agents; record agent type, tool calls | LangChain/LlamaIndex OpenTelemetry callbacks; custom agent framework instrumentation | Critical for agent systems |
| Tool Call Tracer | SDK Library | Create tool_call child span; record tool name, input schema, latency, output summary | Instrumented tool wrapper | High for agent systems |
| OTel Collector | Infrastructure | Receive OTLP; apply AI-specific processors; fan out to trace backend | OTel Collector Contrib; AWS ADOT; Grafana Alloy | Critical |
| Trace Backend | Storage | Store, index, and serve distributed traces; support waterfall UI | Jaeger, Grafana Tempo, AWS X-Ray, Google Cloud Trace | Critical |
| Trace UI | Consumption | Visualise trace waterfall; search by trace ID, model, error status | Jaeger UI, Grafana Tempo UI, AWS X-Ray Console | High |
| Sampling Controller | Collector Processor | Apply head-based + tail-based sampling strategy | OTel tail-based sampling processor; Jaeger agent sampler | Medium |
| Span Enrichment Processor | Collector Processor | Add cost contribution, environment tags, service version to all spans | OTel transform processor; custom processor | Medium |
7. Data Flow
Primary Flow
| Step | Actor | Action | Output |
|---|---|---|---|
| 1 | API Gateway | Receives client request; creates root span with trace ID; injects W3C traceparent header | Root span started; traceparent header on all downstream calls |
| 2 | Authentication Service | Reads traceparent; creates child span; records auth result; forwards traceparent | auth span completed |
| 3 | Rate Limiter | Reads traceparent; creates child span; records limit decision; forwards | rate_limit span completed |
| 4 | Prompt Assembly | Reads traceparent; creates child span; records template ID/version, token count estimate | prompt_assembly span completed |
| 5 | Vector Store Client | Creates child span for retrieval call; records collection, top-k, latency, score range | vector_retrieval span completed |
| 6 | LLM API Client | Creates child span; sends request to model; records model, tokens, cache_hit, cost, finish_reason | llm_api span completed with full GenAI attributes |
| 7 | Tool (if agent pattern) | Receives trace context via baggage; creates child span under llm_api span; executes tool; records result | tool_call span completed |
| 8 | Output Filter | Creates child span; records filter decision; closes span | output_filter span completed |
| 9 | API Gateway | Closes root span; all child spans already closed; full trace tree assembled | Complete trace exported to OTel Collector |
| 10 | OTel Collector | Receives trace; applies enrichment and sampling; exports to trace backend | Trace stored in backend; queryable by trace ID |
Error Flow
| Error Scenario | Detection | Action | Recovery |
|---|---|---|---|
| traceparent header not propagated by a service | Child spans appear as orphaned roots in trace UI | Alert to engineering; service requires instrumentation fix | Fix instrumentation in non-propagating service; verify in staging |
| Agent nesting depth exceeded | Max depth counter in agent tracer | Close trace tree at max depth; log warning; continue execution | Investigate agent runaway loop; fix agent logic |
| Trace backend unavailable | OTel Collector export failure; error metric | Buffer in memory (limited); drop oldest; alert | Restore trace backend; accept trace loss for outage duration |
| Sensitive data in span attribute | PII scrubber processor detects PII in span attribute | Redact attribute value; log PII detection event | Fix instrumentation to not include PII in span attributes |
| Trace size too large (deep agent chain) | Trace size metric exceeds limit | Truncate oldest spans; log truncation event | Increase trace size limit or reduce agent recursion depth |
8. Security Considerations
Authentication: Trace backend access restricted via SSO with RBAC. OTel Collector → trace backend connection uses mTLS or service account. Trace search API requires authentication.
Authorisation: Traces containing sensitive AI outputs (regulated domains) have restricted access. Standard traces accessible to engineers. Security incident traces restricted to security team and incident responders.
Secrets Management: Model API keys must never appear in span attributes. The AI Client Tracer is responsible for ensuring keys are stripped from all recorded request metadata. Regular span audit for credential leakage.
Data Classification: Trace data is classified at least as Internal. Traces containing AI outputs from regulated domains (financial advice, clinical) classified as Confidential. Security incident traces classified as Restricted.
Encryption: Trace data encrypted in transit (TLS 1.3) and at rest (AES-256). Long-term trace archives use envelope encryption.
Auditability: Every access to trace data is logged. Security incident traces have additional access audit controls. Trace data access may be required for legal discovery.
OWASP LLM Top 10 Coverage
| OWASP LLM Risk | Distributed Tracing Control | Implementation |
|---|---|---|
| LLM01 Prompt Injection | Complete trace enables reconstruction of injection attack execution path | prompt_assembly span + llm_api span together show exact injection execution |
| LLM02 Insecure Output Handling | output_filter span records filter decisions; gaps in filtering visible | Missing filter span on an output is an architectural gap alert |
| LLM03 Training Data Poisoning | Training data pipeline traces detect unexpected data sources | Pipeline tracing extends to training data if applicable |
| LLM04 Model Denial of Service | Deep agent traces reveal recursive invocation patterns driving DoS | Agent nesting depth metric; max depth enforcement |
| LLM05 Supply Chain Vulnerabilities | gen_ai.request.model in every trace; unexpected model visible immediately | Model audit from trace data detects supply chain substitution |
| LLM06 Sensitive Information Disclosure | PII scrubber on span attributes; prompt content not in traces by default | Span attribute PII scanning parallel to log PII scanning |
| LLM07 Insecure Plugin Design | tool_call spans record every tool invoked, its inputs, and outputs | Complete tool invocation audit trail in traces |
| LLM08 Excessive Agency | Agent nesting depth and breadth visible in trace tree | Alert on traces exceeding agent_depth > 5 |
| LLM09 Overreliance | Trace + outcome correlation enables overreliance detection | Traces linked to hallucination events (OBS003) surface overreliance patterns |
| LLM10 Model Theft | Bulk extraction attempts produce distinctive trace patterns | Anomalous trace patterns (many short calls with full context) = theft signal |
9. Governance Considerations
Responsible AI: Distributed traces are the audit artefact for AI systems. They enable: reconstruction of any specific AI decision for compliance review; identification of which version of which model produced a specific output; evidence that output filtering was applied. For high-risk AI systems under EU AI Act Article 14, traces demonstrate that human oversight is technically possible.
Model Risk Management: Traces provide the performance data required for model risk management review. Latency waterfall analysis supports capacity management for material models. Unexpected model version changes are detectable from trace data.
Human Approval: Trace data access for security investigations requires CISO or incident commander authorisation for restricted-classification traces.
Policy: Trace retention policy must balance debugging and audit needs against storage cost. Minimum: 30 days hot (full fidelity); 90 days warm (sampled); 7 years cold (compliance-required traces for high-risk AI decisions).
Traceability: The trace ID is the primary key for AI decision reconstruction. Every AI-influenced decision should have a trace ID logged in the application database so that the decision can be reconstructed from the trace. This is the technical foundation for the GDPR/Privacy Act right to explanation.
Governance Artefacts
| Artefact | Owner | Frequency | Format |
|---|---|---|---|
| Trace Sampling Configuration | Platform Engineering | Per change + quarterly review | Version-controlled config |
| Span Attribute Security Audit | Security | Quarterly | Automated scan for credentials + PII in span attributes |
| Trace Retention Policy | Legal / Compliance | Annual | Policy document with retention tier definitions |
| Agent Depth SLO Review | AI Engineering | Monthly | Dashboard review; adjust max depth if needed |
| Trace Search Audit Log | Security | Monthly | Access log export; review for anomalous patterns |
10. Operational Considerations
Monitoring: Trace collection rate (traces per second), trace backend ingestion lag, trace size distribution, and sampling effectiveness metrics are all monitored. The OTel Collector export error rate is a critical metric.
Logging: Collector and trace backend operational logs stored separately from the traces they handle.
Incident Response: During AI incidents, traces are the primary diagnostic tool. Runbooks reference specific trace search queries for each incident type: search for traces with error spans in the last 30 minutes; search for traces with llm_api spans for a specific model version; search for traces with unusual agent_depth.
Disaster Recovery: Trace data loss during a short outage is acceptable for operational debugging purposes. Compliance-required traces (for high-risk AI decisions) must be durably stored with RPO < 1 hour.
Capacity Planning: Trace storage scales with the number of spans per request. An agentic workflow with 10 agent steps, each with 5 tool calls, may produce 100+ spans per request. At 10K requests/day, this is 1M+ spans/day. Columnar compression and aggressive sampling are required at scale.
SLO Table
| SLO | Target | Measurement | Alert Threshold |
|---|---|---|---|
| Trace delivery lag | < 60 seconds from request completion to trace queryable | Collector delivery lag | > 5 minutes |
| Trace completeness | > 99% of sampled requests have complete trace trees | Orphaned span count / total span count | > 1% orphaned spans |
| Trace backend query latency | < 2 seconds for trace lookup by ID | Backend query p95 | > 5 seconds |
| Trace storage availability | > 99.9% | Trace backend health check | < 99.5% for 10 minutes |
Disaster Recovery Table
| Component | RTO | RPO | Recovery Approach |
|---|---|---|---|
| OTel Collector | 5 minutes | Near-zero (active-active) | Auto-failover; in-memory buffer during failover |
| Trace Backend | 30 minutes | 4 hours (non-compliance traces) / 1 hour (compliance) | Replicated backend; restore from object storage archive |
| Trace UI | 30 minutes | N/A (read-only) | Redeploy; underlying data intact |
11. Cost Considerations
Cost Drivers
| Driver | Description | Relative Cost |
|---|---|---|
| Trace backend storage | Spans with AI metadata are 3–5KB each; scales with span count | High at scale |
| Tail-based sampling compute | Keeping 100% of error + slow traces requires in-memory buffering | Medium |
| Agent chain trace depth | Deep agentic traces generate 100x more spans than simple calls | Very High for agent workloads |
| Trace query compute | Interactive trace queries on large trace volumes require significant compute | Medium |
Scaling Risks: Agent workloads with deep recursion can generate extremely large traces. Max depth enforcement and per-trace size limits are essential for cost control.
Optimisations:
- 1% sampling for high-volume standard paths (see sampling strategy)
- Compress trace data before storage; gzip achieves 70% reduction on JSON trace data
- Archive traces older than 30 days to object storage (S3/GCS); query via Athena/BigQuery if needed
Indicative Cost Range
| Scale | AI Requests/Day | Estimated Distributed Tracing Cost/Month |
|---|---|---|
| Small | 10,000 | $200–$500 |
| Medium | 500,000 | $2,000–$6,000 |
| Large | 5,000,000 | $8,000–$25,000 |
| Enterprise (agent-heavy) | 1,000,000 + deep agent chains | $20,000–$80,000 |
12. Trade-Off Analysis
Approach Comparison
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| Full OpenTelemetry with GenAI semantic conventions | Standard; portable; rich AI metadata; works across all components | Instrumentation effort; agent tracing requires custom work; storage cost | Any production AI system; especially multi-component and agentic |
| Vendor-specific tracing (e.g., LangSmith, Arize AI) | Zero instrumentation for supported frameworks; rich UI | Vendor lock-in; limited to supported frameworks; data leaves organisation | LangChain-based systems; organisations without platform engineering |
| Log correlation only (no distributed traces) | Zero infrastructure; uses existing log store | Cannot visualise execution tree; manual correlation is slow; insufficient for agentic | Legacy systems where trace infrastructure cannot be added |
Architectural Tensions
| Tension | Description | Resolution |
|---|---|---|
| Completeness vs. Cost | 100% sampling of all traces is ideal but prohibitively expensive at scale | Tiered sampling: 100% errors and slow traces; 10% standard; 1% high-volume |
| Span richness vs. PII risk | Rich span attributes enable debugging but may capture PII | Default: no prompt/output content in spans; metadata only; content only in logs with PII scrubbing |
| Agent observability vs. Complexity | Agent chains create complex trace trees; too deep to read easily | Max depth enforcement; summary span at each agent level; UI filtering by depth |
| Standard vs. Custom attributes | Standard GenAI conventions ensure tooling compatibility; custom attributes add value | Use standard as baseline; extend with gen_ai.* custom attributes; never deviate from standard for covered fields |
13. Failure Modes
| Failure | Likelihood | Impact | Detection | Recovery |
|---|---|---|---|---|
| traceparent not propagated through a new service | Medium | High (orphaned spans; broken traces) | Orphaned span rate alert | Instrumentation fix in new service; staging trace validation gate |
| Agent max depth exceeded silently | Medium | Medium (incomplete trace for deep agents) | max_depth_exceeded counter | Alert engineers; investigate agent logic; increase limit if legitimate |
| Trace backend storage full | Low | High (traces dropped; debugging impossible) | Storage utilisation alert at 80% | Increase storage; reduce retention; emergency sampling reduction |
| Span attribute contains credentials | Low | Critical (security breach in trace data) | Automated span attribute scan | Immediate trace data quarantine; rotate credentials; fix instrumentation |
| High cardinality span attributes cause index explosion | Medium | High (trace backend OOM / slow queries) | Backend memory and query latency alerts | Remove high-cardinality attribute from span; aggregate to lower cardinality |
Cascading Scenarios
- Scenario 1: New agent framework deployed without trace instrumentation → agent calls invisible in traces → production incident requires reconstructing agent execution → no trace data → extended MTTR → regulatory finding for insufficient AI audit trail. Mitigation: trace completeness gate in deployment pipeline; staging validation of trace fidelity.
- Scenario 2: Agent recursion bug creates 1000-depth agent chain → each chain generates 5000 spans → trace backend OOM → all traces lost → blind flight during incident. Mitigation: max_depth enforcement at 10; per-trace size limit; circuit breaker on span emission.
14. Regulatory Considerations
| Regulation | Clause | Requirement | Distributed Tracing Implementation |
|---|---|---|---|
| EU AI Act | Article 12.1 (Logging) | High-risk AI systems must log inputs and outputs to enable post-hoc verification | Complete traces with all component spans provide post-hoc verification capability |
| EU AI Act | Article 14 (Human Oversight) | Technical measures must allow human oversight and intervention | Trace data enables humans to review and intervene in AI decision processes |
| APRA CPS 234 | Para 36 (Incident Response) | Security incidents must be investigated; evidence must be preserved | Security incident traces (with 7-year retention for high-risk) are the investigation evidence |
| Privacy Act 1988 (AU) | APP 12 (Access to Information) | Individuals have right to access information used in decisions about them | Trace data (linked to decision record) enables reconstruction of decision inputs |
| ISO/IEC 42001 | Clause 9.1 (Monitoring) | AI system operation must be monitored and documented | Distributed traces implement the monitoring documentation requirement |
| NIST AI RMF | GOVERN 4.2 | AI system accountability requires logging enabling attribution of decisions | Trace IDs linked to decisions provide the accountability mechanism |
15. Reference Implementations
AWS
- Instrumentation: AWS Distro for OpenTelemetry (ADOT) SDK; custom AI Client Tracer
- Collector: ADOT Collector on ECS/EKS; sidecar pattern
- Trace Backend: AWS X-Ray; or Grafana Tempo on EKS for higher fidelity
- Agent Tracing: Custom LangChain callback handler emitting OTLP to ADOT collector
- Trace UI: AWS X-Ray Service Map; X-Ray Trace Search
- Sampling: X-Ray reservoir-based sampling; ADOT tail-based sampling processor
Azure
- Instrumentation: Azure Monitor OpenTelemetry Distro; custom AI Client Tracer
- Collector: Azure Monitor OpenTelemetry Distro (built-in collector)
- Trace Backend: Azure Application Insights (built-in trace storage)
- Agent Tracing: Semantic Kernel built-in OpenTelemetry; custom callback handler
- Trace UI: Application Insights Transaction Search; Azure Monitor Workbooks
- Sampling: Application Insights adaptive sampling (manages cost automatically)
GCP
- Instrumentation: Google Cloud OpenTelemetry SDK; custom AI Client Tracer
- Collector: OpenTelemetry Collector on GKE with Cloud Trace exporter
- Trace Backend: Google Cloud Trace
- Agent Tracing: Vertex AI Extensions with Cloud Trace propagation
- Trace UI: Cloud Trace Timeline UI; Cloud Logging correlated views
- Sampling: Cloud Trace automatic sampling; custom OTLP collector sampling
On-Premises
- Instrumentation: OpenTelemetry SDK (language-native); custom AI Client Tracer
- Collector: OpenTelemetry Collector Contrib (self-hosted on Kubernetes)
- Trace Backend: Grafana Tempo (open source, highly scalable, object storage backend)
- Agent Tracing: LangChain/LlamaIndex custom OTLP callback; custom agent framework instrumentation
- Trace UI: Grafana Tempo UI integrated in Grafana
- Sampling: OTel tail-based sampling processor (supports all sampling strategies)
16. Related Patterns
| Pattern ID | Pattern Name | Relationship | Notes |
|---|---|---|---|
| EAAPL-OBS001 | AI Telemetry Architecture | Foundation | Collector and backend infrastructure shared; trace signals one of three from OBS001 |
| EAAPL-OBS002 | Prompt Monitoring | Extends | Prompt metadata in logs; trace context links log records to spans |
| EAAPL-OBS003 | Hallucination Detection | Depends On | Detection events linked to traces via requestId; trace shows execution context of hallucination |
| EAAPL-OBS004 | AI Incident Management | Depends On | Trace search is primary diagnostic tool in incident runbooks |
| EAAPL-OBS008 | AI Performance Benchmarking | Sibling | Benchmark latency vs production latency from traces |
17. Maturity Assessment
Overall Maturity: Proven
| Dimension | Score (1–5) | Rationale |
|---|---|---|
| Adoption Breadth | 3 | Distributed tracing mainstream in microservices; AI-specific extensions still early majority |
| Tooling Ecosystem | 4 | OTel GenAI conventions released 2024; tooling maturing rapidly; vendor support improving |
| Operational Runbook Coverage | 3 | Generic trace debugging runbooks mature; AI-specific agent tracing runbooks custom |
| Regulatory Evidence | 3 | EU AI Act Article 12 logging requirement drives adoption; pattern is the recommended implementation |
| Cost Predictability | 3 | Agent workload trace costs can surprise teams; sampling strategy discipline required |
| Team Skill Availability | 4 | OpenTelemetry skills broadly available; GenAI conventions require AI-specific training |
18. Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-06-12 | EAAPL Working Group | Initial publication |