GenAI semantic conventions
OpenTelemetry defines GenAI semantic conventions across spans and events. Scouter extracts both: span attributes map to structured fields on GenAiSpanRecord, and two specific event types (gen_ai.client.inference.operation.details and gen_ai.evaluation.result) populate opt-in content and evaluation results.
Metrics like token usage and operation latency are computed from span attributes via DataFusion queries. You don’t need a separate OTel metrics pipeline.
What triggers GenAI extraction
Section titled “What triggers GenAI extraction”A span is recognized as a GenAI span if and only if it has the gen_ai.operation.name attribute set. Scouter scans all attributes once on ingest and populates a GenAiSpanRecord from them. Spans without this attribute go through the normal span path and are unaffected.
This mirrors the OTel spec, where gen_ai.operation.name is the required attribute for all GenAI spans.
Supported span attributes
Section titled “Supported span attributes”| OTel attribute | Field in GenAiSpanRecord | Notes |
|---|---|---|
gen_ai.operation.name | operation_name | Required — absence skips extraction |
gen_ai.provider.name | provider_name | e.g. openai, anthropic, google |
gen_ai.request.model | request_model | Model the caller requested |
gen_ai.response.model | response_model | Model that served the response (may differ) |
gen_ai.response.id | response_id | Provider-assigned response ID |
gen_ai.usage.input_tokens | input_tokens | Prompt tokens |
gen_ai.usage.output_tokens | output_tokens | Completion tokens |
gen_ai.usage.cache_creation.input_tokens | cache_creation_input_tokens | Anthropic cache write tokens |
gen_ai.usage.cache_read.input_tokens | cache_read_input_tokens | Anthropic cache read tokens |
gen_ai.response.finish_reasons | finish_reasons | JSON array, JSON-encoded array string, or bare string |
gen_ai.output.type | output_type | e.g. text, json, image |
gen_ai.conversation.id | conversation_id | Groups multi-turn spans into a conversation |
gen_ai.agent.name | agent_name | Agent name for multi-agent systems |
gen_ai.agent.id | agent_id | |
gen_ai.agent.description | agent_description | Free-form agent description |
gen_ai.agent.version | agent_version | e.g. 1.0.0, 2025-05-01 |
gen_ai.data_source.id | data_source_id | RAG / retrieval data source identifier |
gen_ai.tool.name | tool_name | Tool invoked in a tool-use span |
gen_ai.tool.type | tool_type | e.g. function, retrieval |
gen_ai.tool.call.id | tool_call_id | Provider-assigned call ID |
gen_ai.request.temperature | request_temperature | |
gen_ai.request.max_tokens | request_max_tokens | |
gen_ai.request.top_p | request_top_p | |
gen_ai.request.choice.count | request_choice_count | Number of candidate completions requested |
gen_ai.request.seed | request_seed | |
gen_ai.request.frequency_penalty | request_frequency_penalty | |
gen_ai.request.presence_penalty | request_presence_penalty | |
gen_ai.request.stop_sequences | request_stop_sequences | String array; accepts JSON-encoded array or raw array value |
server.address | server_address | GenAI server hostname or IP |
server.port | server_port | Required when server.address is set |
error.type | error_type | Error classification on failed spans |
openai.api.type | openai_api_type | OpenAI-specific: chat, responses, etc. |
openai.response.service_tier | openai_service_tier | e.g. default, flex |
gen_ai.input.messages | input_messages | Opt-in; see Opt-in content |
gen_ai.output.messages | output_messages | Opt-in; see Opt-in content |
gen_ai.system_instructions | system_instructions | Opt-in; see Opt-in content |
gen_ai.tool.definitions | tool_definitions | Opt-in; see Opt-in content |
Token fields accept JSON numbers, floats, or numeric strings. finish_reasons handles all three of: a JSON array, a JSON-encoded array string, and a bare string.
Span events
Section titled “Span events”Scouter recognizes two OTel GenAI event types and extracts their content into GenAiSpanRecord. Everything else passes through to TraceSpanRecord.events unchanged.
gen_ai.client.inference.operation.details
Section titled “gen_ai.client.inference.operation.details”This event duplicates the parent span’s scalar attributes (tokens, model, operation name, finish reasons) and adds the four opt-in content fields. The duplication is intentional: the event is designed to work standalone, without a span context attached.
When the event is on a span, Scouter ignores the scalar attributes from the event entirely. Span attributes win for scalars, full stop. Only the four opt-in content fields (input_messages, output_messages, system_instructions, tool_definitions) use event fallback: if missing from the span, Scouter reads them from the event instead.
gen_ai.evaluation.result
Section titled “gen_ai.evaluation.result”No span attribute equivalent exists for eval scores. They always come from events. A span can have multiple gen_ai.evaluation.result events attached; Scouter collects all of them into eval_results. Events without gen_ai.evaluation.name are silently dropped — it’s the only required field.
Source-of-truth rule
Section titled “Source-of-truth rule”Scalar attrs (tokens, model, operation, finish_reasons, etc.) → span attributes only, no event fallback
Opt-in content (input_messages, output_messages, system_instructions, tool_definitions) → span attribute first; falls back to gen_ai.client.inference.operation.details event
gen_ai.evaluation.result → events only, no span attribute equivalentToken counts, latency, model breakdown, agent activity: all computed from span attributes. Event processing doesn’t affect them.
Opt-in content
Section titled “Opt-in content”The four opt-in content fields (input_messages, output_messages, system_instructions, tool_definitions) are off by default. Your instrumentation library has to explicitly enable them. On spans they’re JSON-encoded strings; on the gen_ai.client.inference.operation.details event they can be structured objects.
Don’t turn them on without a reason. A multi-turn conversation with tool call results can be hundreds of kilobytes per span, and that adds up fast under load. For most observability needs — cost tracking, latency analysis, error rates — the scalar fields are all you need. Turn the content fields on when you actually need the message bodies: building evaluation datasets from production traces, debugging prompt regressions, conversation replay.
import jsonfrom opentelemetry import tracefrom scouter.tracing import ScouterInstrumentor
ScouterInstrumentor().instrument( attributes={"service.name": "my-llm-service"},)
tracer = trace.get_tracer("my-llm-service")
with tracer.start_as_current_span("llm_call") as span: span.set_attribute("gen_ai.operation.name", "chat") span.set_attribute("gen_ai.provider.name", "anthropic") span.set_attribute("gen_ai.request.model", "claude-opus-4-6") span.set_attribute("gen_ai.usage.input_tokens", 512) span.set_attribute("gen_ai.usage.output_tokens", 128)
# Opt-in content — only set if you need it span.set_attribute("gen_ai.input.messages", json.dumps([ {"role": "user", "parts": [{"type": "text", "content": "What is RAG?"}]} ]))
response = call_my_llm(prompt)
span.set_attribute("gen_ai.output.messages", json.dumps([ {"role": "assistant", "parts": [{"type": "text", "content": response.text}], "finish_reason": "end_turn"} ]))Evaluation results
Section titled “Evaluation results”Eval scores attach to spans as gen_ai.evaluation.result events. Each event maps to a GenAiEvalResult entry in eval_results:
| Field | Type | OTel attribute |
|---|---|---|
name | str | gen_ai.evaluation.name (required) |
score_label | str | None | gen_ai.evaluation.score.label |
score_value | float | None | gen_ai.evaluation.score.value |
explanation | str | None | gen_ai.evaluation.explanation |
response_id | str | None | gen_ai.response.id |
A span can carry multiple eval events, one per metric. Scouter collects all of them into eval_results. If your instrumentation library emits these events, they’re captured automatically. You can also add them manually:
from opentelemetry import trace
span = trace.get_current_span()span.add_event("gen_ai.evaluation.result", { "gen_ai.evaluation.name": "Relevance", "gen_ai.evaluation.score.value": 0.92, "gen_ai.evaluation.score.label": "relevant", "gen_ai.evaluation.explanation": "The response directly addresses the user question.", "gen_ai.response.id": "chatcmpl-abc123",})An event missing gen_ai.evaluation.name is silently skipped. If a span has no eval events, eval_results is an empty list.
Automatic capture with framework integrations
Section titled “Automatic capture with framework integrations”Frameworks that emit GenAI semantic convention attributes work without extra configuration. Register ScouterInstrumentor before any framework code runs.
from scouter.tracing import ScouterInstrumentorfrom scouter import GrpcConfigfrom agents import Agent, Runner
ScouterInstrumentor().instrument( transport_config=GrpcConfig(), attributes={"service.name": "openai-agent-service"},)
agent = Agent(name="ResearchAgent", instructions="...", tools=[...])result = Runner.run_sync(agent, "Summarize recent advances in RAG.")# LLM call spans carry gen_ai.* attributes — extracted automaticallyfrom scouter.tracing import ScouterInstrumentorfrom scouter.transport import GrpcConfig
# ADK calls get_tracer_provider() internally — picks up Scouter automaticallyScouterInstrumentor().instrument( transport_config=GrpcConfig(), attributes={"service.name": "adk-service"},)from scouter.tracing import ScouterInstrumentorfrom scouter import GrpcConfigfrom opentelemetry.instrumentation.langchain import LangchainInstrumentor
ScouterInstrumentor().instrument(transport_config=GrpcConfig())LangchainInstrumentor().instrument()See the ScouterInstrumentor guide for full setup and framework coverage.
Adding attributes manually
Section titled “Adding attributes manually”If your code calls an LLM directly without a pre-instrumented SDK, set the attributes yourself. The only hard requirement is gen_ai.operation.name. The example below assumes you already called ScouterInstrumentor().instrument(...) during startup.
from opentelemetry import trace
tracer = trace.get_tracer("my-llm-service")
with tracer.start_as_current_span("llm_call") as span: span.set_attribute("gen_ai.operation.name", "chat") span.set_attribute("gen_ai.provider.name", "anthropic") span.set_attribute("gen_ai.request.model", "claude-opus-4-6") span.set_attribute("gen_ai.request.temperature", 0.7) span.set_attribute("gen_ai.request.max_tokens", 1024) span.set_attribute("gen_ai.usage.input_tokens", 512) span.set_attribute("gen_ai.usage.output_tokens", 128) span.set_attribute("gen_ai.response.finish_reasons", '["end_turn"]') span.set_attribute("server.address", "api.anthropic.com") span.set_attribute("server.port", 443)
response = call_my_llm(prompt) span.set_output({"text": response.content})For multi-turn conversations, set gen_ai.conversation.id consistently across all spans in the session:
span.set_attribute("gen_ai.conversation.id", session_id)For agent spans, include the agent metadata so the agent activity endpoint can aggregate correctly:
span.set_attribute("gen_ai.operation.name", "invoke_agent")span.set_attribute("gen_ai.agent.name", "ResearchAgent")span.set_attribute("gen_ai.agent.id", "asst_abc123")span.set_attribute("gen_ai.agent.description", "Searches and summarizes research papers.")span.set_attribute("gen_ai.agent.version", "1.2.0")For RAG applications that use a data source:
span.set_attribute("gen_ai.data_source.id", "H7STPQYOND")Querying GenAI data
Section titled “Querying GenAI data”The server exposes read endpoints under /scouter/genai/. Metrics endpoints accept POST with a GenAiMetricsRequest body; service_name, operation_name, provider_name, and model are optional filters.
{ "start_time": "2026-04-01T00:00:00Z", "end_time": "2026-04-19T23:59:59Z", "service_name": "my-llm-service", "operation_name": "chat", "provider_name": "anthropic", "model": "claude-opus-4-6", "bucket_interval": "hour"}Token metrics
Section titled “Token metrics”POST /scouter/genai/metrics/tokens
Input/output token counts bucketed by time, with error rate per bucket. bucket_interval defaults to "hour".
import httpx
client = httpx.Client(base_url="http://scouter:8000")
resp = client.post("/scouter/genai/metrics/tokens", json={ "start_time": "2026-04-18T00:00:00Z", "end_time": "2026-04-19T00:00:00Z", "service_name": "my-llm-service",})
for bucket in resp.json()["buckets"]: print(bucket["bucket_start"], bucket["total_input_tokens"], bucket["total_output_tokens"])Response fields per bucket: bucket_start, total_input_tokens, total_output_tokens, total_cache_creation_tokens, total_cache_read_tokens, span_count, error_rate.
Operation breakdown
Section titled “Operation breakdown”POST /scouter/genai/metrics/operations
Per-operation aggregates: span count, average latency, total tokens, error rate. Useful for understanding cost split between chat, embeddings, and tool calls.
Response fields per operation: operation_name, provider_name, span_count, avg_duration_ms, total_input_tokens, total_output_tokens, error_rate.
Model usage
Section titled “Model usage”POST /scouter/genai/metrics/models
Per-model aggregates with p50/p95 latency percentiles. Compare performance across model versions or providers.
Response fields per model: model, provider_name, span_count, total_input_tokens, total_output_tokens, p50_duration_ms, p95_duration_ms, error_rate.
Agent activity
Section titled “Agent activity”POST /scouter/genai/metrics/agents
Per-agent aggregates. Accepts an optional agent_name query parameter.
resp = client.post( "/scouter/genai/metrics/agents", params={"agent_name": "ResearchAgent"}, json={"start_time": "2026-04-18T00:00:00Z", "end_time": "2026-04-19T00:00:00Z"},)Response fields per agent: agent_name, agent_id, conversation_id, span_count, total_input_tokens, total_output_tokens, last_seen.
Tool activity
Section titled “Tool activity”POST /scouter/genai/metrics/tools
Per-tool aggregates: call count, average duration, error rate.
Response fields: tool_name, tool_type, call_count, avg_duration_ms, error_rate.
Error breakdown
Section titled “Error breakdown”POST /scouter/genai/metrics/errors
Counts by error.type value for the given time window.
resp = client.post("/scouter/genai/metrics/errors", json={ "start_time": "2026-04-18T00:00:00Z", "end_time": "2026-04-19T00:00:00Z",})for err in resp.json()["errors"]: print(err["error_type"], err["count"])Raw GenAI spans
Section titled “Raw GenAI spans”POST /scouter/genai/spans
Returns individual GenAiSpanRecord objects matching a filter set. Each record includes all extracted span attributes plus eval_results (a list of GenAiEvalResult objects, empty if no eval events were present) and the four opt-in content fields (input_messages, output_messages, system_instructions, tool_definitions) when they were recorded.
resp = client.post("/scouter/genai/spans", json={ "service_name": "my-llm-service", "provider_name": "anthropic", "start_time": "2026-04-18T00:00:00Z", "end_time": "2026-04-19T00:00:00Z", "limit": 100,})Available filters: service_name, start_time, end_time, operation_name, provider_name, model, conversation_id, agent_name, tool_name, error_type, limit.
Conversation spans
Section titled “Conversation spans”GET /scouter/genai/conversation/{conversation_id}
Returns all GenAI spans for a conversation, ordered chronologically. Accepts optional start_time and end_time query parameters in RFC3339 format.
resp = client.get( f"/scouter/genai/conversation/{conversation_id}", params={"start_time": "2026-04-18T00:00:00Z", "end_time": "2026-04-19T00:00:00Z"},)conversation_id longer than 256 characters returns 400.
OpsML integration
Section titled “OpsML integration”Scouter’s GenAI span data is consumed by OpsML, which provides a frontend for browsing token usage trends, model latency, agent activity, and conversation traces. If you’re running OpsML alongside Scouter, the GenAI views are populated automatically once spans with gen_ai.operation.name start flowing in.
TraceAssertionTask
Section titled “TraceAssertionTask”GenAI spans stored in Scouter are queryable by TraceAssertionTask during offline or online evaluation — useful for writing assertions against token budgets, latency SLAs, or model behavior across production traces. See TraceAssertionTask for details.