可观测与评估
一句话总结看见 agent 内部发生了什么(tracing、日志、token/成本),并衡量它做得好不好(eval)。Agent 行为不确定,可观测与评估是把它从 demo 推向生产的必备工程能力。
它解决什么问题
agent 的多步、调工具、非确定性使其难以调试和信任。tracing 让你回放每一步;eval 让你量化质量、防止回归。
设计维度 / 实现谱系
- Tracing:内置 ↔ 集成 OpenTelemetry/第三方(VoltAgent 内建 observability)
- 指标:token、成本、延迟、步数、成功率
- 日志/回放:结构化事件、时间旅行调试
- 评估:人工 ↔ 规则 ↔ LLM-as-judge ↔ 数据集回归
- 闭环:评估结果是否反哺改进(ACE 从反馈学习)
关键要点
- 可观测优先级常被低估,却是生产 agent 的成败关键。
- LLM-as-judge 是主流 eval 手段,但需校准。
- 评估闭环(eval 到改进)通向自改进 agent。
关联
各框架实现对比
下表汇总 49 个实现了「可观测 / 评估」的框架(源码级阅读结论)。网站上以可展开 + 源码节选呈现。
| 框架 | 实现方式 |
|---|---|
| Aeon | 每次成功运行后 Haiku 自动打 1-5 分(失败/空=1,优秀=5),写 memory/skill-health/{skill}.json(滚动 30 次 + avg);token 用量记 token-usage.csv;cron-state.json 存成功率/连败数;skill-evals 断言测试;scripts/skill-runs 审计 Actions 运行 |
| AG2 | runtime_logging 全局开关,BaseLogger 抽象 + SqliteLogger/FileLogger 后端记录 chat/LLM 调用/成本/工具事件;gather_usage_summary 汇总 token/cost;内建 OpenTelemetry instrumentation(agent/llm/pattern span);contrib/agent_eval 做评估 |
| Agency Swarm | 复用 SDK 内建 tracing(OpenAI Traces 自动),并通过 with trace(…) 接入 Langfuse / AgentOps(examples/observability.py);自动累计 token/cost(sub-agent raw_responses 按模型回填到父 result,execution.py:252);可视化 agency.visualize() 输出结构图 |
| Agent-LLM (AGiXT) | 全程把活动写入 conversation 日志([ACTIVITY]/[SUBACTIVITY] 标记,含命令执行成功/失败);webhook 事件 command.execution.started/failed(Extensions.py:1078);UsageTrackingMiddleware 记 token/用量;评估类 chain(Smart Instruct)做自反思。无独立 eval harness(待确认) |
| AgentDock | 内置 Evaluation Framework:runEvaluation runner + 多评估器(RuleBased/LLMJudge/NLPAccuracy/ToolUsage/LexicalSimilarity/KeywordCoverage/Sentiment/Toxicity),结果落 JsonFileStorage;结构化分类日志 logger(LogCategory);token 用量经 onFinish 累积进 orchestration 状态(cumulativeTokenUsage) |
| AgentField | 自动 workflow DAG 可视化(GET /api/v1/workflows/{id}/dag);Prometheus /metrics(discovery 等用 promauto 埋点);结构化 JSON 日志;执行时间线;/health+/ready(K8s);app.note() 写审计日志。形式化 eval N/A(靠 VC 审计而非 eval 框架) |
| Agentic Context Engine (ACE) | 本框架重点。EvaluateStep+TaskEnvironment 产出反馈/对错信号;自带 tau2-bench 等基准(benchmarks/);可观测:ObservabilityStep、Logfire 自动插桩 PydanticAI(logfire extra)、kayba-tracing SDK(configure/trace/start_span)、每条 skill 的 helpful/harmful/used 计数即效用度量 |
| AgentScope | 一等公民 OpenTelemetry:TracingMiddleware(middleware/_tracing/) 为 agent/llm/tool 各层开 span,依赖 opentelemetry-sdk + OTLP exporter(pyproject 强依赖);事件流本身即细粒度可观测;app/ 服务侧带 OTel。README 提 “built-in evaluation”,但本仓 src/agentscope 下未见独立 eval 包(评估在 docs/examples 层) |
| Agentset | 检索流程经 stream 实时回传状态(data-status: generating-queries/searching/generating-answer,agentic/index.ts:61)与日志(logs);用量计入 Postgres(chat/route.ts:33);服务端事件分析(logServerEvent);Tinybird 存 webhook 事件;README 列 evaluation/benchmarks 为平台特性 |
| AgentVerse | ① 单例 Logger(仿 Auto-GPT 风格,彩色 + logs/activity.log/error.log + typewriter 效果,logging.py:32);② 每个 agent 经 get_spend() 统计美元花费,环境 report_metrics() 汇总(environments/base.py:50);③ task-solving Evaluator 规则给 plan 打分(score≥8 阈值即 accept,tasksolving_env/basic.py:95),agentverse-benchmark 在数据集上批量评测 |
| Astron Agent | 全链路 OpenTelemetry:common/otlp,每步 span.start(…) + add_info_events,结构化 NodeLog/NodeTraceLog/Usage(token 计数)逐节点落 trace;接入 DeepWiki 徽章。无内置自动化 eval 框架(评估口径 待确认) |
| AutoGen | runtime 内建 OpenTelemetry tracing(TraceHelper,可经 tracer_provider 注入,AUTOGEN_DISABLE_RUNTIME_TRACING 关闭);结构化事件流(每步 ToolCallRequestEvent/ThoughtEvent 等)+ EVENT_LOGGER_NAME/TRACE_LOGGER_NAME 日志;评估工具 AGBench(python/packages/agbench) |
| Botpress | onTrace 非阻塞钩子接收每条 trace(llm_call_started、工具调用、错误、输出);packages/llmz/src/types.ts 定义 Trace 类型;Cognitive 有 request/response interceptors 可埋点;测试用 Vitest+LLM 重试+快照序列化器 |
| ConnectOnion | 每步写 current_session[‘trace’];Logger 三路输出(终端 Rich + .co/logs/{name}.log 纯文本 + .co/evals/.yaml 会话),含 token/cost;eval 插件做评估;@xray+auto_debug() 交互式断点调试 |
| Cordum | 重点。① 防篡改审计:HMAC-SHA256 签名的 per-tenant 哈希链(Redis Stream + CAS Lua)core/audit/chain.go:265,链校验 chain_verify.go;② SIEM 导出(webhook/syslog/Datadog/CloudWatch/SOC2)core/audit/exporter.go:283;③ DecisionLog 记录每次策略裁决 scheduler/decision_log_adapter.go;④ OTel metrics/trace core/infra/otel/;⑤ Policy Simulator 拿历史数据预演规则(kernel.go:623 Simulate)+ shadow eval safetykernel/shadow_eval.go |
| Cortex Memory | tracing 结构化日志(logging.rs);REST /health+/health/ready 健康检查;stats 统计与 UpdateStats/CacheStats(skip_rate/cache_hit_rate);Svelte 仪表盘(insights) 可视化;LoCoMo10 基准脚本 examples/locomo-evaluation |
| CrewAI | 内置事件总线 crewai_event_bus(LLM/Tool/Agent/Memory 全生命周期事件) + OpenTelemetry 匿名遥测(可 OTEL_SDK_DISABLED 关);Task guardrail / task_evaluator 做输出评估 |
| Dust | 多层:Langfuse LLM trace(@langfuse/tracing + front/lib/api/llm/traces/)、OpenTelemetry(Temporal 工作流拦截器 + core/src/open_telemetry.rs)、产品级 observability 指标(tool/skill/datasource 用量与延迟,含 Elasticsearch 分析)、用户 feedback |
| E2B | 沙箱级遥测而非 agent 评估:getMetrics() 取 CPU/内存/磁盘,控制面 /sandboxes/{id}/logs、/metrics 端点;RPC 可挂 createRpcLogger 记录通信 |
| Haystack | Tracing:Tracer/Span 抽象,自动接 OpenTelemetry/Datadog,auto_enable_tracing()(init.py 启动时调用),含 LoggingTracer;内容级 trace 由 env 开关;Eval:components/evaluators/(faithfulness/context_relevance/SAS/MRR/NDCG/recall/LLMEvaluator…)+ EvaluationRunResult 出报表 |
| hcom | hcom TUI(ratatui)看板看全部 agent;hcom list 列活跃 agent;hcom term [name] 看/注入某 agent 实时 PTY 屏幕(经 TCP inject 端口 + vt100 解析,commands/term.rs:1, :35);hcom transcript 读对方结构化转录;hcom events —wait 阻塞直到匹配(脚本化);hcom status 诊断 |
| Hermes Agent | session_search 工具对 SQLite FTS5 全文索引做跨会话召回(discovery/scroll/browse 三模式,零 LLM 成本);hermes logs —session |
| Hive | DecisionTracker 记录每个决策(尝试什么/选了什么/结果)=进化的原料;runtime_logger/runtime_log_store 结构化日志;EventBus 事件流给 dashboard;judge 评估节点输出对照 success_criteria;HoneyComb 外部观察台 |
| Lagent | MessageLogger hook 给每条 AgentMessage 按 sender 着色打印到日志(可选文件 handler);get_steps() 把工具循环展开成 thought/tool/environment 轨迹。无内建 token/cost 统计与评估框架 |
| LangChain | core 内建 callbacks + tracers 体系(core/…/tracers/);每个 middleware 钩子用 @traceable 包成 LangSmith span(factory.py:910,1019)并 _scrub_inputs 脱敏(factory.py:140);评估/监控由外部 LangSmith 平台承担(README) |
| Llama Agentic System (llama-stack-apps) | 可观测=AgentEventLogger/EventLogger 流式打印每步(shield_call/inference/tool_execution),turn.steps 可遍历 step_type;评估=llama-stack-client eval run_scoring CLI + agent_store/eval/bulk_generate.py 批量跑数据集生成答案再打分 |
| LlamaIndex | 独立 llama-index-instrumentation 包:Dispatcher 发 span/event,@dispatcher.span 装饰、add_event_handler/add_span_handler 挂钩(对接 Arize/Langfuse 等);agent 每步 write_event_to_stream 暴露 AgentStream/ToolCall 等事件;core/evaluation/ 提供 faithfulness/relevancy 等 RAG 评估器 |
| llm-agents | 仅靠 print():开头打印渲染后的 prompt、每轮打印 generated+Observation(agent.py:66,77);无结构化 trace、无 token/cost 统计、无 eval 框架。tests/ 目录仅含 setup 校验与空 unit/integration 包 |
| LoongFlow | ① 全程 get_logger 结构化日志 + Rich 美化 message 打印(message_logger.py),每步打 trace_id;② 逐 cycle 统计 prompt/completion token 与成本(pes_agent.py:294);③ Evaluator 是一等公民:把候选代码写文件、在独立子进程带 timeout 执行用户 evaluate() 拿 score/metrics/summary;④ math_agent 自带 visualizer 看进化树/岛分布 |
| Maestro | 用 rich Console/Panel 彩色打印每步过程;逐次打印 input/output token 与按 calculate_subagent_cost() 估算的美元成本;全程交换日志写入时间戳 .md。无评估框架 |
| Mastra | AI tracing:SpanType 枚举(AGENT_RUN/WORKFLOW_RUN/MODEL_GENERATION/TOOL_CALL/MEMORY_OPERATION/RAG_ 等)构成结构化 span 树,经 Observability 入口(@mastra/observability,含 storage/platform/OTel exporter)导出;evals/scorers:@mastra/evals + evals/scoreTraces 对 trace 打分;logger/ 分级日志 |
| MetaGPT | CostManager 在每次 LLM 调用后累计 token/成本(_update_costs),Team.invest 设预算超支抛 NoMoneyException;loguru 全局日志(metagpt/logs.py);exp_pool(经验池)用 @exp_cache 装饰器缓存+打分(SimpleScorer/LLM judge)历史经验供复用 |
| Modus | console 包做结构化日志(debug/info/warn/error,经 host function 上报);agent 经 PublishEvent 发事件→GoAkt topic actor→GraphQL Subscription 经 SSE(text/event-stream) 推送;集成 Sentry span 做分布式追踪。无内置 eval 框架 |
| nanobot | 全程 loguru 结构化日志(含 turn 状态机 trace StateTraceEntry、tool 事件、token usage);运行时事件总线 RuntimeEventBus 推送给 WebUI(model/状态/延迟);可选 Langfuse tracing(设 LANGFUSE_SECRET_KEY 自动包裹 OpenAI 客户端)与 LangSmith;无内置评估框架(pytest 测试套件) |
| Open Multi-Agent | onProgress 结构化事件(task_start/complete/retry/skipped/budget_exceeded…) + onTrace span(llm_call/tool_call/task/agent/plan_ready/agent_stream) + 跑后 renderTeamRunDashboard() 生成纯 HTML 任务 DAG 仪表盘;密钥/token 经 redaction.ts 自动脱敏。无内置 eval 框架 |
| OpenClaw | agent loop 发射结构化事件流(agent_start/turn_start/message_/tool_execution_/turn_end/agent_end)供 UI/日志消费;每条消息带 usage(token+cost);/usage、/trace on、/verbose chat 命令;cron run-log(JSONL)记录每次定时运行;trajectory/transcripts 子系统留存轨迹;qa/ 下有 e2e 与 QA lab extension |
| Pilot Protocol | 结构化 JSON 日志走 slog;pilotctl info/—json 暴露地址/对端/连接/uptime 等快照;Polo 公共 dashboard 展示全网节点/请求统计;1048 个测试(含大量拥塞控制/SACK/重放回归用例 zz__bug_test.go) |
| Pipecat | BaseObserver 旁路监听 frame 流(on_process_frame/on_push_frame),不改管道;内置 turn/latency/startup observer;PipelineParams(enable_metrics=, enable_usage_metrics=) 收集 token/延迟;OpenTelemetry 追踪经 TurnTraceObserver + utils/tracing/(extra tracing),Sentry 集成 |
| PraisonAI | MinimalTelemetry(PostHog 匿名用量,隐私优先) + OpenTelemetry 集成(traces/spans/metrics,README 标注)+ Langfuse tracing(praisonai langfuse);token/cost 收集 (telemetry/token_collector.py);eval/ 做 accuracy/performance/reliability/criteria 评估 |
| Semantic Kernel | 内建 OpenTelemetry:KernelFunction 自带 ActivitySource(“Microsoft.SemanticKernel”) + Meter(invocation/streaming duration histogram);agent 调用经 ModelDiagnostics.StartAgentInvocationActivity;过滤器+结构化日志(LoggerMessage)。评估无内建框架,依赖外部 |
| smolagents | Monitor 经 ActionStep callback 累计 token/步时长;AgentLogger(Rich) 分级日志;memory.replay() 回放;return_full_result 返回 RunResult(token_usage/steps/timing/state);telemetry extra 接 OpenTelemetry/Arize Phoenix |
| Strands Agents | 一等公民 OpenTelemetry:Tracer 为 agent/cycle/model/tool 起 span(telemetry/tracer.py:77),EventLoopMetrics 记 token/延迟/cycle,StrandsTelemetry 一键装配;callback_handler 流式回调(默认 PrintingCallbackHandler);评估走 OTEL 导出 |
| Swarm | 仅 debug_print |
| SwarmClaw | OpenTelemetry OTLP traces(@opentelemetry/sdk-node,env 配端点/headers);自研 logger/execution-log/activity-log/run-ledger;usage/cost 计量;eval/ 做 baseline+environment-plan 评估;autonomy supervisor 反思每次自治 run |
| Swarms | loguru 日志(utils/loguru_logger.py);遥测默认向 swarms.world 上报 agent 数据(SWARMS_TELEMETRY_ON 开关,telemetry/main.py:150);评估类拓扑 council_as_judge/debate_with_judge/majority_voting 充当 LLM-as-judge |
| Transformers Agents | 步骤日志、verbose 输出;无内建 eval |
| Upsonic | eval/ 子包:AccuracyEvaluator、performance、reliability 三类评测器(.run());可观测经 integrations/ 接 Langfuse / OpenTelemetry(otel extra) / PromptLayer;core 依赖含 sentry-sdk[opentelemetry];pipeline 每步发事件 |
| vectara-agentic | 内置 Arize Phoenix(OpenInference instrument LlamaIndex,_observability.py:16 setup_observer),eval_fcs() 把 Vectara FCS 分数作为 span 评估写回(_observability.py:101)。回调 AgentCallbackHandler/agent_progress_callback 实时上报 TOOL_CALL/TOOL_OUTPUT(agent.py:623)。VHC(幻觉纠正) compute_vhc/analyze_hallucinations 是其独特评估能力 |
| VoltAgent | 核心卖点:全栈 OpenTelemetry,3 个自定义 SpanProcessor——WebSocket(实时推 VoltOps Console)、LocalStorage(本地 trace 存储+查询)、LazyRemoteExport(OTLP→VoltOps/任意后端);零配置默认开启。评估:eval(create-scorer/LLM-judge) + 独立 @voltagent/scorers/@voltagent/evals + langfuse exporter |
各框架实现对比 · 源码级
Aeon yaml 每次成功运行后 Haiku 自动打 1-5 分(失败/空=1,优秀=5),写 memory/skill-health/{skill}.json(滚动 30 次 + avg);token 用量记 token-usage.csv;cron-state.json 存成功率/连败数;skill-evals 断言测试;scripts/skill-runs 审计 Actions 运行
每次成功运行后 Haiku 自动打 1-5 分(失败/空=1,优秀=5),写 memory/skill-health/{skill}.json(滚动 30 次 + avg);token 用量记 token-usage.csv;cron-state.json 存成功率/连败数;skill-evals 断言测试;scripts/skill-runs 审计 Actions 运行
github/workflows/aeon.yml:604github/workflows/aeon.yml:687 fi
echo "::notice::Skill output captured to .outputs/${SKILL}.md ($(wc -c < ".outputs/${SKILL}.md") bytes)"
- name: Analyze skill output
id: analyze
if: steps.work.outputs.mode != '' && steps.run.outcome == 'success'
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
ANTHROPIC_BASE_URL: ${{ vars.ANTHROPIC_BASE_URL }}
BANKR_LLM_KEY: ${{ secrets.BANKR_LLM_KEY }}
run: |
SKILL="${{ steps.skill.outputs.name }}" AG2 python runtime_logging 全局开关,BaseLogger 抽象 + SqliteLogger/FileLogger 后端记录 chat/LLM 调用/成本/工具事件;gather_usage_summary 汇总 token/cost;内建 OpenTelemetry instrumentation(agent/llm/pattern span);contrib/agent_eval 做评估
runtime_logging 全局开关,BaseLogger 抽象 + SqliteLogger/FileLogger 后端记录 chat/LLM 调用/成本/工具事件;gather_usage_summary 汇总 token/cost;内建 OpenTelemetry instrumentation(agent/llm/pattern span);contrib/agent_eval 做评估
logger/base_logger.py:26logger/sqlite_logger.py:66LLMConfig = dict[str, None | float | int | ConfigItem | list[ConfigItem]]
class BaseLogger(ABC):
@abstractmethod
def start(self) -> str:
"""Open a connection to the logging database, and start recording.
Returns:
session_id (str): a unique id for the logging session
"""
... Agency Swarm python 复用 SDK 内建 tracing(OpenAI Traces 自动),并通过 with trace(...) 接入 Langfuse / AgentOps(examples/observability.py);自动累计 token/cost(sub-agent raw_responses 按模型回填到父 result,execution.py:252);可视化 agency.visualize() 输出结构图
复用 SDK 内建 tracing(OpenAI Traces 自动),并通过 with trace(...) 接入 Langfuse / AgentOps(examples/observability.py);自动累计 token/cost(sub-agent raw_responses 按模型回填到父 result,execution.py:252);可视化 agency.visualize() 输出结构图
examples/observability.py:92agent/execution.py:252# ────────────────────────────────
# Example tracing wrappers
# ────────────────────────────────
async def openai_tracing(input_message: str) -> str:
agency_instance = create_agency()
with trace("OpenAI tracing"):
response = await agency_instance.get_response(message=input_message)
return response.final_output
async def langfuse_tracing(input_message: str) -> str:
if os.getenv("LANGFUSE_SECRET_KEY") is None or os.getenv("LANGFUSE_PUBLIC_KEY") is None:
raise ValueError("LANGFUSE api keys are not set") Agent-LLM (AGiXT) python 全程把活动写入 conversation 日志([ACTIVITY]/[SUBACTIVITY] 标记,含命令执行成功/失败);webhook 事件 command.execution.started/failed(Extensions.py:1078);UsageTrackingMiddleware 记 token/用量;评估类 chain(Smart Instruct)做自反思。无独立 eval harness(待确认)
全程把活动写入 conversation 日志([ACTIVITY]/[SUBACTIVITY] 标记,含命令执行成功/失败);webhook 事件 command.execution.started/failed(Extensions.py:1078);UsageTrackingMiddleware 记 token/用量;评估类 chain(Smart Instruct)做自反思。无独立 eval harness(待确认)
Interactions.py:7542Extensions.py:1078
c.log_interaction(
role=self.agent_name,
message=f"[SUBACTIVITY][{thinking_id}][EXECUTION] `{command_name}` was executed successfully.\n{command_output}",
)
# Emit webhook event for successful command execution
await webhook_emitter.emit_event(
event_type="command.execution.completed",
data={
"conversation_id": c.get_conversation_id(),
"conversation_name": conversation_name,
"agent_name": self.agent_name, AgentDock typescript 内置 Evaluation Framework:runEvaluation runner + 多评估器(RuleBased/LLMJudge/NLPAccuracy/ToolUsage/LexicalSimilarity/KeywordCoverage/Sentiment/Toxicity),结果落 JsonFileStorage;结构化分类日志 logger(LogCategory);token 用量经 onFinish 累积进 orchestration 状态(cumulativeTokenUsage)
内置 Evaluation Framework:runEvaluation runner + 多评估器(RuleBased/LLMJudge/NLPAccuracy/ToolUsage/LexicalSimilarity/KeywordCoverage/Sentiment/Toxicity),结果落 JsonFileStorage;结构化分类日志 logger(LogCategory);token 用量经 onFinish 累积进 orchestration 状态(cumulativeTokenUsage)
llm/llm-orchestration-service.ts:421 /**
* Performs the actual token usage update operation.
*/
private async performTokenUsageUpdate(usage: TokenUsage): Promise<void> {
try {
// Get current state
const currentState = await this.orchestrationManager.getState(
this.sessionId
);
const currentUsage = currentState?.cumulativeTokenUsage || {
promptTokens: 0,
completionTokens: 0,
totalTokens: 0 AgentField go 自动 workflow DAG 可视化(GET /api/v1/workflows/{id}/dag);Prometheus /metrics(discovery 等用 promauto 埋点);结构化 JSON 日志;执行时间线;/health+/ready(K8s);app.note() 写审计日志。形式化 eval N/A(靠 VC 审计而非 eval 框架)
自动 workflow DAG 可视化(GET /api/v1/workflows/{id}/dag);Prometheus /metrics(discovery 等用 promauto 埋点);结构化 JSON 日志;执行时间线;/health+/ready(K8s);app.note() 写审计日志。形式化 eval N/A(靠 VC 审计而非 eval 框架)
control-plane/internal/handlers/discovery.go:18agent.py:4190 "github.com/Agent-Field/agentfield/control-plane/internal/logger"
"github.com/Agent-Field/agentfield/control-plane/pkg/types"
"github.com/gin-gonic/gin"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
// AgentLister is the minimal dependency required for discovery.
type AgentLister interface {
ListAgents(ctx context.Context, filters types.AgentFilters) ([]*types.AgentNode, error)
}
// DiscoveryFilters captures query parameters for capability discovery. Agentic Context Engine (ACE) python 本框架重点。EvaluateStep+TaskEnvironment 产出反馈/对错信号;自带 tau2-bench 等基准(benchmarks/);可观测:ObservabilityStep、Logfire 自动插桩 PydanticAI(logfire extra)、kayba-tracing SDK(configure/trace/start_span)、每条 skill 的 helpful/harmful/used 计数即效用度量
本框架重点。EvaluateStep+TaskEnvironment 产出反馈/对错信号;自带 tau2-bench 等基准(benchmarks/);可观测:ObservabilityStep、Logfire 自动插桩 PydanticAI(logfire extra)、kayba-tracing SDK(configure/trace/start_span)、每条 skill 的 helpful/harmful/used 计数即效用度量
ace/tracing/__init__.py:17 pip install ace-framework[tracing]
"""
from kayba_tracing import (
configure,
disable,
enable,
get_folder,
get_trace,
search_traces,
set_folder,
start_span,
trace, AgentScope python 一等公民 OpenTelemetry:TracingMiddleware(middleware/_tracing/) 为 agent/llm/tool 各层开 span,依赖 opentelemetry-sdk + OTLP exporter(pyproject 强依赖);事件流本身即细粒度可观测;app/ 服务侧带 OTel。README 提 "built-in evaluation",但本仓 src/agentscope 下未见独立 eval 包(评估在 docs/examples 层)
一等公民 OpenTelemetry:TracingMiddleware(middleware/_tracing/) 为 agent/llm/tool 各层开 span,依赖 opentelemetry-sdk + OTLP exporter(pyproject 强依赖);事件流本身即细粒度可观测;app/ 服务侧带 OTel。README 提 "built-in evaluation",但本仓 src/agentscope 下未见独立 eval 包(评估在 docs/examples 层)
middleware/_tracing/_trace.py:116event/_event.py:14# ---------------------------------------------------------------------------
class TracingMiddleware(MiddlewareBase):
"""Agent middleware that adds OpenTelemetry tracing to the reply,
model-call and tool-execution lifecycles.
When tracing has not been configured (``setup_tracing`` was not called),
every hook short-circuits to ``next_handler`` with near-zero overhead.
Example::
from agentscope.middleware import TracingMiddleware Agentset typescript 检索流程经 stream 实时回传状态(data-status: generating-queries/searching/generating-answer,agentic/index.ts:61)与日志(logs);用量计入 Postgres(chat/route.ts:33);服务端事件分析(logServerEvent);Tinybird 存 webhook 事件;README 列 evaluation/benchmarks 为平台特性
检索流程经 stream 实时回传状态(data-status: generating-queries/searching/generating-answer,agentic/index.ts:61)与日志(logs);用量计入 Postgres(chat/route.ts:33);服务端事件分析(logServerEvent);Tinybird 存 webhook 事件;README 列 evaluation/benchmarks 为平台特性
agentic/index.ts:61apps/web/src/types/ai.ts:14chat/route.ts:33 type: "start-step",
});
writer.write({
type: "data-status",
data: { value: "generating-queries" },
});
// step 1. generate queries
const { chunks, queryToResult, totalQueries } = await agenticSearch({
model,
messages,
queryOptions, AgentVerse python ① 单例 Logger(仿 Auto-GPT 风格,彩色 + logs/activity.log/error.log + typewriter 效果,logging.py:32);② 每个 agent 经 get_spend() 统计美元花费,环境 report_metrics() 汇总(environments/base.py:50);③ task-solving Evaluator 规则给 plan 打分(score≥8 阈值即 accept,tasksolving_env/basic.py:95),agentverse-benchmark 在数据集上批量评测
① 单例 Logger(仿 Auto-GPT 风格,彩色 + logs/activity.log/error.log + typewriter 效果,logging.py:32);② 每个 agent 经 get_spend() 统计美元花费,环境 report_metrics() 汇总(environments/base.py:50);③ task-solving Evaluator 规则给 plan 打分(score≥8 阈值即 accept,tasksolving_env/basic.py:95),agentverse-benchmark 在数据集上批量评测
logging.py:32environments/base.py:50environments/tasksolving_env/basic.py:95 return record.msg
class Logger(metaclass=Singleton):
"""
Logger that handle titles in different colors.
Outputs logs in console, activity.log, and errors.log
For console handler: simulates typing
"""
def __init__(self):
# create log directory if it doesn't exist
this_files_dir_path = os.path.dirname(__file__) Astron Agent python 全链路 OpenTelemetry:common/otlp,每步 span.start(...) + add_info_events,结构化 NodeLog/NodeTraceLog/Usage(token 计数)逐节点落 trace;接入 DeepWiki 徽章。无内置自动化 eval 框架(评估口径 待确认)
全链路 OpenTelemetry:common/otlp,每步 span.start(...) + add_info_events,结构化 NodeLog/NodeTraceLog/Usage(token 计数)逐节点落 trace;接入 DeepWiki 徽章。无内置自动化 eval 框架(评估口径 待确认)
cot_runner.py:225engine/nodes/base.py:36
node_end_time = int(round(time.time() * 1000))
data_llm_output = answers
node_trace_log.trace.append(
NodeLog(
id=node_id,
sid=node_sid,
node_id=node_node_id,
node_name=node_name,
node_type=node_type,
start_time=node_start_time,
end_time=node_end_time,
duration=node_end_time - node_start_time, AutoGen python runtime 内建 OpenTelemetry tracing(TraceHelper,可经 tracer_provider 注入,AUTOGEN_DISABLE_RUNTIME_TRACING 关闭);结构化事件流(每步 ToolCallRequestEvent/ThoughtEvent 等)+ EVENT_LOGGER_NAME/TRACE_LOGGER_NAME 日志;评估工具 AGBench(python/packages/agbench)
runtime 内建 OpenTelemetry tracing(TraceHelper,可经 tracer_provider 注入,AUTOGEN_DISABLE_RUNTIME_TRACING 关闭);结构化事件流(每步 ToolCallRequestEvent/ThoughtEvent 等)+ EVENT_LOGGER_NAME/TRACE_LOGGER_NAME 日志;评估工具 AGBench(python/packages/agbench)
autogen-core/src/autogen_core/_single_threaded_agent_runtime.py:256 tracer_provider: TracerProvider | None = None,
ignore_unhandled_exceptions: bool = True,
) -> None:
self._tracer_helper = TraceHelper(tracer_provider, MessageRuntimeTracingConfig("SingleThreadedAgentRuntime"))
self._message_queue: Queue[PublishMessageEnvelope | SendMessageEnvelope | ResponseMessageEnvelope] = Queue()
# (namespace, type) -> List[AgentId]
self._agent_factories: Dict[
str, Callable[[], Agent | Awaitable[Agent]] | Callable[[AgentRuntime, AgentId], Agent | Awaitable[Agent]]
] = {}
self._instantiated_agents: Dict[AgentId, Agent] = {}
self._intervention_handlers = intervention_handlers
self._background_tasks: Set[Task[Any]] = set()
self._subscription_manager = SubscriptionManager() Botpress typescript onTrace 非阻塞钩子接收每条 trace(llm_call_started、工具调用、错误、输出);packages/llmz/src/types.ts 定义 Trace 类型;Cognitive 有 request/response interceptors 可埋点;测试用 Vitest+LLM 重试+快照序列化器
onTrace 非阻塞钩子接收每条 trace(llm_call_started、工具调用、错误、输出);packages/llmz/src/types.ts 定义 Trace 类型;Cognitive 有 request/response interceptors 可埋点;测试用 Vitest+LLM 重试+快照序列化器
packages/llmz/src/llmz.ts:335 }
cleanups.push(
iteration.traces.onPush((traces) => {
for (const trace of traces) {
onTrace?.({ trace, iteration: ctx.iterations.length })
}
})
)
try {
await executeIteration({
iteration, ConnectOnion python 每步写 current_session['trace'];Logger 三路输出(终端 Rich + .co/logs/{name}.log 纯文本 + .co/evals/.yaml 会话),含 token/cost;eval 插件做评估;@xray+auto_debug() 交互式断点调试
每步写 current_session['trace'];Logger 三路输出(终端 Rich + .co/logs/{name}.log 纯文本 + .co/evals/.yaml 会话),含 token/cost;eval 插件做评估;@xray+auto_debug() 交互式断点调试
core/agent.py:167 import uuid
return str(uuid.uuid4())
def _record_trace(self, entry: dict) -> None:
"""Record trace entry and stream to io if connected.
This is the single place where trace entries are recorded.
Ensures both local trace and remote streaming stay in sync.
Also includes current session state so client can persist it
(client-side is source of truth for session state).
"""
if 'id' not in entry:
entry['id'] = self._next_trace_id() Cordum go 重点。① 防篡改审计:HMAC-SHA256 签名的 per-tenant 哈希链(Redis Stream + CAS Lua)core/audit/chain.go:265,链校验 chain_verify.go;② SIEM 导出(webhook/syslog/Datadog/CloudWatch/SOC2)core/audit/exporter.go:283;③ DecisionLog 记录每次策略裁决 scheduler/decision_log_adapter.go;④ OTel metrics/trace core/infra/otel/;⑤ Policy Simulator 拿历史数据预演规则(kernel.go:623 Simulate)+ shadow eval safetykernel/shadow_eval.go
重点。① 防篡改审计:HMAC-SHA256 签名的 per-tenant 哈希链(Redis Stream + CAS Lua)core/audit/chain.go:265,链校验 chain_verify.go;② SIEM 导出(webhook/syslog/Datadog/CloudWatch/SOC2)core/audit/exporter.go:283;③ DecisionLog 记录每次策略裁决 scheduler/decision_log_adapter.go;④ OTel metrics/trace core/infra/otel/;⑤ Policy Simulator 拿历史数据预演规则(kernel.go:623 Simulate)+ shadow eval safetykernel/shadow_eval.go
core/audit/chain.go:265// Seq and EventHash cleared. PrevHash is part of the hashed bytes so any
// tampering with a predecessor (direct mutation or reordering) invalidates
// every descendant hash — this is what gives the chain its tamper-evidence.
func (c *Chainer) Append(ctx context.Context, event *SIEMEvent) error {
if event == nil {
return ErrNilEvent
}
if event.TenantID == "" {
return ErrTenantRequired
}
unlockTenant := c.lockTenant(event.TenantID)
defer unlockTenant() Cortex Memory rust tracing 结构化日志(logging.rs);REST /health+/health/ready 健康检查;stats 统计与 UpdateStats/CacheStats(skip_rate/cache_hit_rate);Svelte 仪表盘(insights) 可视化;LoCoMo10 基准脚本 examples/locomo-evaluation
tracing 结构化日志(logging.rs);REST /health+/health/ready 健康检查;stats 统计与 UpdateStats/CacheStats(skip_rate/cache_hit_rate);Svelte 仪表盘(insights) 可视化;LoCoMo10 基准脚本 examples/locomo-evaluation
cascade_layer_updater.rs:44cortex-mem-service/src/main.rs:135 self.updated_count + self.skipped_count
}
pub fn skip_rate(&self) -> f64 {
if self.total_operations() == 0 {
0.0
} else {
self.skipped_count as f64 / self.total_operations() as f64
}
}
pub fn cache_hit_rate(&self) -> f64 {
let total = self.cache_hits + self.cache_misses; CrewAI python 内置事件总线 crewai_event_bus(LLM/Tool/Agent/Memory 全生命周期事件) + OpenTelemetry 匿名遥测(可 OTEL_SDK_DISABLED 关);Task guardrail / task_evaluator 做输出评估
内置事件总线 crewai_event_bus(LLM/Tool/Agent/Memory 全生命周期事件) + OpenTelemetry 匿名遥测(可 OTEL_SDK_DISABLED 关);Task guardrail / task_evaluator 做输出评估
crewai/telemetry/telemetry.py:1crewai/tasks/llm_guardrail.py:49"""Telemetry module for CrewAI.
This module provides anonymous telemetry collection for development purposes.
No prompts, task descriptions, agent backstories/goals, responses, or sensitive
data is collected. Users can opt-in to share more complete data using the
`share_crew` attribute.
"""
from __future__ import annotations Dust typescript 多层:Langfuse LLM trace(@langfuse/tracing + front/lib/api/llm/traces/)、OpenTelemetry(Temporal 工作流拦截器 + core/src/open_telemetry.rs)、产品级 observability 指标(tool/skill/datasource 用量与延迟,含 Elasticsearch 分析)、用户 feedback
多层:Langfuse LLM trace(@langfuse/tracing + front/lib/api/llm/traces/)、OpenTelemetry(Temporal 工作流拦截器 + core/src/open_telemetry.rs)、产品级 observability 指标(tool/skill/datasource 用量与延迟,含 Elasticsearch 分析)、用户 feedback
front/temporal/agent_loop/workflows.ts:61} from "@temporalio/interceptors-opentelemetry/lib/workflow";
// Export an interceptors variable to add OpenTelemetry interceptors to the workflow.
export const interceptors: WorkflowInterceptorsFactory = () => ({
inbound: [new OpenTelemetryInboundInterceptor()],
outbound: [new OpenTelemetryOutboundInterceptor()],
internals: [new OpenTelemetryInternalsInterceptor()],
});
const { runModelAndCreateActionsActivity } = proxyActivities<
typeof runModelAndCreateWrapperActivities
>({
startToCloseTimeout: "10 minutes", E2B typescript 沙箱级遥测而非 agent 评估:getMetrics() 取 CPU/内存/磁盘,控制面 /sandboxes/{id}/logs、/metrics 端点;RPC 可挂 createRpcLogger 记录通信
沙箱级遥测而非 agent 评估:getMetrics() 取 CPU/内存/磁盘,控制面 /sandboxes/{id}/logs、/metrics 端点;RPC 可挂 createRpcLogger 记录通信
js-sdk/src/sandbox/index.ts:736 *
* @returns List of sandbox metrics containing CPU, memory and disk usage information.
*/
async getMetrics(opts?: SandboxMetricsOpts) {
if (this.envdApi.version) {
if (compareVersions(this.envdApi.version, '0.1.5') < 0) {
throw new SandboxError(
'You need to update the template to use the new SDK. ' +
'You can do this by running `e2b template build` in the directory with the template.'
)
}
if (compareVersions(this.envdApi.version, '0.2.4') < 0) { Haystack python Tracing:Tracer/Span 抽象,自动接 OpenTelemetry/Datadog,auto_enable_tracing()(__init__.py 启动时调用),含 LoggingTracer;内容级 trace 由 env 开关;Eval:components/evaluators/(faithfulness/context_relevance/SAS/MRR/NDCG/recall/LLMEvaluator…)+ EvaluationRunResult 出报表
Tracing:Tracer/Span 抽象,自动接 OpenTelemetry/Datadog,auto_enable_tracing()(__init__.py 启动时调用),含 LoggingTracer;内容级 trace 由 env 开关;Eval:components/evaluators/(faithfulness/context_relevance/SAS/MRR/NDCG/recall/LLMEvaluator…)+ EvaluationRunResult 出报表
tracing/tracer.py:82tracing/logging_tracer.py:34evaluation/eval_run_result.py:18 return {}
class Tracer(abc.ABC):
"""Interface for instrumenting code by creating and submitting spans."""
@abc.abstractmethod
@contextlib.contextmanager
def trace(
self, operation_name: str, tags: dict[str, Any] | None = None, parent_span: Span | None = None
) -> Iterator[Span]:
"""
Trace the execution of a block of code. hcom rust hcom TUI(ratatui)看板看全部 agent;hcom list 列活跃 agent;hcom term [name] 看/注入某 agent 实时 PTY 屏幕(经 TCP inject 端口 + vt100 解析,commands/term.rs:1, :35);hcom transcript 读对方结构化转录;hcom events --wait 阻塞直到匹配(脚本化);hcom status 诊断
hcom TUI(ratatui)看板看全部 agent;hcom list 列活跃 agent;hcom term [name] 看/注入某 agent 实时 PTY 屏幕(经 TCP inject 端口 + vt100 解析,commands/term.rs:1, :35);hcom transcript 读对方结构化转录;hcom events --wait 阻塞直到匹配(脚本化);hcom status 诊断
commands/term.rs:35
/// Look up inject port for an instance.
///
/// The inject port is a bidirectional RPC server (input bytes / `\x00SCREEN\n`
/// query) — it shares the `notify_endpoints` table with wake endpoints but
/// uses a different protocol. See `crate::notify::WakeKind` for the wake kinds.
fn get_inject_port(db: &HcomDb, instance_name: &str) -> Option<i32> {
db.conn()
.query_row(
"SELECT port FROM notify_endpoints WHERE instance = ?1 AND kind = 'inject'",
rusqlite::params![instance_name],
|row| row.get(0),
) Hermes Agent python session_search 工具对 SQLite FTS5 全文索引做跨会话召回(discovery/scroll/browse 三模式,零 LLM 成本);hermes logs --session <id> 按 session 过滤(set_session_context);/usage·/insights 看 token/成本;batch_runner.py+trajectory_compressor.py 产训练轨迹
session_search 工具对 SQLite FTS5 全文索引做跨会话召回(discovery/scroll/browse 三模式,零 LLM 成本);hermes logs --session <id> 按 session 过滤(set_session_context);/usage·/insights 看 token/成本;batch_runner.py+trajectory_compressor.py 产训练轨迹
tools/session_search_tool.py:1hermes_state.py:321#!/usr/bin/env python3
"""
Session Search Tool - Long-Term Conversation Recall
Single-shape tool with three calling modes (inferred from args, no explicit
mode parameter):
1. DISCOVERY — pass ``query``. Runs FTS5, dedupes hits by session lineage,
returns top N sessions each with: snippet, ±5 message window around the
match, plus bookend_start (first 3 user+assistant msgs of session) and Hive python DecisionTracker 记录每个决策(尝试什么/选了什么/结果)=进化的原料;runtime_logger/runtime_log_store 结构化日志;EventBus 事件流给 dashboard;judge 评估节点输出对照 success_criteria;HoneyComb 外部观察台
DecisionTracker 记录每个决策(尝试什么/选了什么/结果)=进化的原料;runtime_logger/runtime_log_store 结构化日志;EventBus 事件流给 dashboard;judge 评估节点输出对照 success_criteria;HoneyComb 外部观察台
tracker/decision_tracker.py:24logger = logging.getLogger(__name__)
class DecisionTracker:
"""
The runtime environment that agents execute within.
Usage:
runtime = Runtime("/path/to/storage")
# Start a run
run_id = runtime.start_run("goal_123", "Qualify sales leads")
Lagent python MessageLogger hook 给每条 AgentMessage 按 sender 着色打印到日志(可选文件 handler);get_steps() 把工具循环展开成 thought/tool/environment 轨迹。无内建 token/cost 统计与评估框架
MessageLogger hook 给每条 AgentMessage 按 sender 着色打印到日志(可选文件 handler);get_steps() 把工具循环展开成 thought/tool/environment 轨迹。无内建 token/cost 统计与评估框架
hooks/logger.py:9agents/stream.py:114from .hook import Hook
class MessageLogger(Hook):
def __init__(self, name: str = 'lagent', add_file_handler: bool = False):
self.logger = get_logger(
name, 'info', '%(asctime)s %(levelname)8s %(name)8s - %(message)s', add_file_handler=add_file_handler
)
self.sender2color = {}
def before_agent(self, agent, messages, session_id):
for message in messages:
self._process_message(message, session_id) LangChain python core 内建 callbacks + tracers 体系(core/.../tracers/);每个 middleware 钩子用 @traceable 包成 LangSmith span(factory.py:910,1019)并 _scrub_inputs 脱敏(factory.py:140);评估/监控由外部 LangSmith 平台承担(README)
core 内建 callbacks + tracers 体系(core/.../tracers/);每个 middleware 钩子用 @traceable 包成 LangSmith span(factory.py:910,1019)并 _scrub_inputs 脱敏(factory.py:140);评估/监控由外部 LangSmith 平台承担(README)
factory.py:140factory.py:1019""".strip()
def _scrub_inputs(inputs: dict[str, Any]) -> dict[str, Any]:
"""Remove ``runtime`` and ``handler`` from trace inputs before sending to LangSmith."""
filtered = inputs.copy()
filtered.pop("handler", None)
req = filtered.get("request")
if isinstance(req, (ModelRequest, ToolCallRequest)):
filtered["request"] = {
f.name: getattr(req, f.name) for f in fields(req) if f.name != "runtime"
}
return filtered Llama Agentic System (llama-stack-apps) python 可观测=AgentEventLogger/EventLogger 流式打印每步(shield_call/inference/tool_execution),turn.steps 可遍历 step_type;评估=llama-stack-client eval run_scoring CLI + agent_store/eval/bulk_generate.py 批量跑数据集生成答案再打分
可观测=AgentEventLogger/EventLogger 流式打印每步(shield_call/inference/tool_execution),turn.steps 可遍历 step_type;评估=llama-stack-client eval run_scoring CLI + agent_store/eval/bulk_generate.py 批量跑数据集生成答案再打分
examples/agents/react_agent.py:73examples/agent_store/api.py:250examples/agent_store/eval/bulk_generate.py:25 session_id=session_id,
stream=True,
)
for log in EventLogger().log(response):
log.print()
user_prompt2 = "What are the popular llms supported in torchtune?"
print(colored(f"User> {user_prompt2}", "blue"))
response2 = agent.create_turn(
messages=[{"role": "user", "content": user_prompt2}],
session_id=session_id,
stream=True,
) LlamaIndex python 独立 llama-index-instrumentation 包:Dispatcher 发 span/event,@dispatcher.span 装饰、add_event_handler/add_span_handler 挂钩(对接 Arize/Langfuse 等);agent 每步 write_event_to_stream 暴露 AgentStream/ToolCall 等事件;core/evaluation/ 提供 faithfulness/relevancy 等 RAG 评估器
独立 llama-index-instrumentation 包:Dispatcher 发 span/event,@dispatcher.span 装饰、add_event_handler/add_span_handler 挂钩(对接 Arize/Langfuse 等);agent 每步 write_event_to_stream 暴露 AgentStream/ToolCall 等事件;core/evaluation/ 提供 faithfulness/relevancy 等 RAG 评估器
instrumentation/__init__.py:1llama-index-instrumentation/src/llama_index_instrumentation/dispatcher.py:50from llama_index_instrumentation import (
DispatcherSpanMixin, # noqa
get_dispatcher, # noqa
root_dispatcher, # noqa
root_manager, # noqa
)
from llama_index_instrumentation.dispatcher import (
DISPATCHER_SPAN_DECORATED_ATTR, # noqa
Dispatcher, # noqa
Manager, # noqa llm-agents python 仅靠 print():开头打印渲染后的 prompt、每轮打印 generated+Observation(agent.py:66,77);无结构化 trace、无 token/cost 统计、无 eval 框架。tests/ 目录仅含 setup 校验与空 unit/integration 包
仅靠 print():开头打印渲染后的 prompt、每轮打印 generated+Observation(agent.py:66,77);无结构化 trace、无 token/cost 统计、无 eval 框架。tests/ 目录仅含 setup 校验与空 unit/integration 包
agent.py:66 question=question,
previous_responses='{previous_responses}'
)
print(prompt.format(previous_responses=''))
while num_loops < self.max_loops:
num_loops += 1
curr_prompt = prompt.format(previous_responses='\n'.join(previous_responses))
generated, tool, tool_input = self.decide_next_action(curr_prompt)
if tool == 'Final Answer':
return tool_input
if tool not in self.tool_by_names:
raise ValueError(f"Unknown tool: {tool}")
tool_result = self.tool_by_names[tool].use(tool_input) LoongFlow python ① 全程 get_logger 结构化日志 + Rich 美化 message 打印(message_logger.py),每步打 trace_id;② 逐 cycle 统计 prompt/completion token 与成本(pes_agent.py:294);③ Evaluator 是一等公民:把候选代码写文件、在独立子进程带 timeout 执行用户 evaluate() 拿 score/metrics/summary;④ math_agent 自带 visualizer 看进化树/岛分布
① 全程 get_logger 结构化日志 + Rich 美化 message 打印(message_logger.py),每步打 trace_id;② 逐 cycle 统计 prompt/completion token 与成本(pes_agent.py:294);③ Evaluator 是一等公民:把候选代码写文件、在独立子进程带 timeout 执行用户 evaluate() 拿 score/metrics/summary;④ math_agent 自带 visualizer 看进化树/岛分布
framework/pes/evaluator/evaluator.py:126 pass
class LoongFlowEvaluator(Evaluator):
"""
LoongFlow Evaluator
"""
def __init__(self, config: EvaluatorConfig):
self.config = config
self._logger = get_logger(self.__class__.__name__)
self._thread_executor = concurrent.futures.ThreadPoolExecutor() Maestro python 用 rich Console/Panel 彩色打印每步过程;逐次打印 input/output token 与按 calculate_subagent_cost() 估算的美元成本;全程交换日志写入时间戳 .md。无评估框架
用 rich Console/Panel 彩色打印每步过程;逐次打印 input/output token 与按 calculate_subagent_cost() 估算的美元成本;全程交换日志写入时间戳 .md。无评估框架
maestro.py:23maestro.py:66maestro.py:289SUB_AGENT_MODEL = "claude-3-5-sonnet-20240620"
REFINER_MODEL = "claude-3-5-sonnet-20240620"
def calculate_subagent_cost(model, input_tokens, output_tokens):
# Pricing information per model
pricing = {
"claude-3-opus-20240229": {"input_cost_per_mtok": 15.00, "output_cost_per_mtok": 75.00},
"claude-3-haiku-20240307": {"input_cost_per_mtok": 0.25, "output_cost_per_mtok": 1.25},
"claude-3-sonnet-20240229": {"input_cost_per_mtok": 3.00, "output_cost_per_mtok": 15.00},
"claude-3-5-sonnet-20240620": {"input_cost_per_mtok": 3.00, "output_cost_per_mtok": 15.00},
}
# Calculate cost Mastra typescript AI tracing:SpanType 枚举(AGENT_RUN/WORKFLOW_RUN/MODEL_GENERATION/TOOL_CALL/MEMORY_OPERATION/RAG_ 等)构成结构化 span 树,经 Observability 入口(@mastra/observability,含 storage/platform/OTel exporter)导出;evals/scorers:@mastra/evals + evals/scoreTraces 对 trace 打分;logger/ 分级日志
AI tracing:SpanType 枚举(AGENT_RUN/WORKFLOW_RUN/MODEL_GENERATION/TOOL_CALL/MEMORY_OPERATION/RAG_ 等)构成结构化 span 树,经 Observability 入口(@mastra/observability,含 storage/platform/OTel exporter)导出;evals/scorers:@mastra/evals + evals/scoreTraces 对 trace 打分;logger/ 分级日志
observability/types/tracing.ts:35mastra/index.ts:295/**
* AI-specific span types with their associated metadata
*/
export enum SpanType {
/** Agent run - root span for agent processes */
AGENT_RUN = 'agent_run',
/** Scorer execution */
SCORER_RUN = 'scorer_run',
/** Individual scorer pipeline step */
SCORER_STEP = 'scorer_step',
/** Generic span for custom operations */
GENERIC = 'generic',
/** Model generation with model calls, token usage, prompts, completions */ MetaGPT python CostManager 在每次 LLM 调用后累计 token/成本(_update_costs),Team.invest 设预算超支抛 NoMoneyException;loguru 全局日志(metagpt/logs.py);exp_pool(经验池)用 @exp_cache 装饰器缓存+打分(SimpleScorer/LLM judge)历史经验供复用
CostManager 在每次 LLM 调用后累计 token/成本(_update_costs),Team.invest 设预算超支抛 NoMoneyException;loguru 全局日志(metagpt/logs.py);exp_pool(经验池)用 @exp_cache 装饰器缓存+打分(SimpleScorer/LLM judge)历史经验供复用
metagpt/provider/base_llm.py:124metagpt/team.py:98metagpt/exp_pool/decorator.py:29 def _default_system_msg(self):
return self._system_msg(self.system_prompt)
def _update_costs(self, usage: Union[dict, BaseModel], model: str = None, local_calc_usage: bool = True):
"""update each request's token cost
Args:
model (str): model name or in some scenarios called endpoint
local_calc_usage (bool): some models don't calculate usage, it will overwrite LLMConfig.calc_usage
"""
calc_usage = self.config.calc_usage and local_calc_usage
model = model or self.pricing_plan
model = model or self.model
usage = usage.model_dump() if isinstance(usage, BaseModel) else usage Modus go console 包做结构化日志(debug/info/warn/error,经 host function 上报);agent 经 PublishEvent 发事件→GoAkt topic actor→GraphQL Subscription 经 SSE(text/event-stream) 推送;集成 Sentry span 做分布式追踪。无内置 eval 框架
console 包做结构化日志(debug/info/warn/error,经 host function 上报);agent 经 PublishEvent 发事件→GoAkt topic actor→GraphQL Subscription 经 SSE(text/event-stream) 推送;集成 Sentry span 做分布式追踪。无内置 eval 框架
sdk/go/pkg/console/console.go:24runtime/actors/agents.go:280runtime/graphql/graphql.go:164 Log(fmt.Sprintf(format, args...))
}
func Debug(message string) {
hostLogMessage("debug", message)
}
func Debugf(format string, args ...any) {
Debug(fmt.Sprintf(format, args...))
}
func Info(message string) {
hostLogMessage("info", message) nanobot python 全程 loguru 结构化日志(含 turn 状态机 trace StateTraceEntry、tool 事件、token usage);运行时事件总线 RuntimeEventBus 推送给 WebUI(model/状态/延迟);可选 Langfuse tracing(设 LANGFUSE_SECRET_KEY 自动包裹 OpenAI 客户端)与 LangSmith;无内置评估框架(pytest 测试套件)
全程 loguru 结构化日志(含 turn 状态机 trace StateTraceEntry、tool 事件、token usage);运行时事件总线 RuntimeEventBus 推送给 WebUI(model/状态/延迟);可选 Langfuse tracing(设 LANGFUSE_SECRET_KEY 自动包裹 OpenAI 客户端)与 LangSmith;无内置评估框架(pytest 测试套件)
agent/loop.py:88providers/openai_compat_provider.py:403
@dataclass
class StateTraceEntry:
state: TurnState
started_at: float
duration_ms: float
event: str
error: str | None = None
@dataclass
class TurnContext: Open Multi-Agent typescript onProgress 结构化事件(task_start/complete/retry/skipped/budget_exceeded…) + onTrace span(llm_call/tool_call/task/agent/plan_ready/agent_stream) + 跑后 renderTeamRunDashboard() 生成纯 HTML 任务 DAG 仪表盘;密钥/token 经 redaction.ts 自动脱敏。无内置 eval 框架
onProgress 结构化事件(task_start/complete/retry/skipped/budget_exceeded…) + onTrace span(llm_call/tool_call/task/agent/plan_ready/agent_stream) + 跑后 renderTeamRunDashboard() 生成纯 HTML 任务 DAG 仪表盘;密钥/token 经 redaction.ts 自动脱敏。无内置 eval 框架
src/orchestrator/orchestrator.ts:635src/dashboard/render-team-run-dashboard.ts:17 return
}
config.onProgress?.({
type: 'task_start',
task: task.id,
agent: assignee,
data: task,
} satisfies OrchestratorEvent)
config.onProgress?.({
type: 'agent_start',
agent: assignee, OpenClaw typescript agent loop 发射结构化事件流(agent_start/turn_start/message_/tool_execution_/turn_end/agent_end)供 UI/日志消费;每条消息带 usage(token+cost);/usage、/trace on、/verbose chat 命令;cron run-log(JSONL)记录每次定时运行;trajectory/transcripts 子系统留存轨迹;qa/ 下有 e2e 与 QA lab extension
agent loop 发射结构化事件流(agent_start/turn_start/message_/tool_execution_/turn_end/agent_end)供 UI/日志消费;每条消息带 usage(token+cost);/usage、/trace on、/verbose chat 命令;cron run-log(JSONL)记录每次定时运行;trajectory/transcripts 子系统留存轨迹;qa/ 下有 e2e 与 QA lab extension
agent-loop.ts:25import { validateToolArguments } from "./validation.js";
/** Callback used by synchronous loop runners to publish agent lifecycle events. */
export type AgentEventSink = (event: AgentEvent) => Promise<void> | void;
const EMPTY_USAGE = {
input: 0,
output: 0,
cacheRead: 0,
cacheWrite: 0,
totalTokens: 0,
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
}; Pilot Protocol go 结构化 JSON 日志走 slog;pilotctl info/--json 暴露地址/对端/连接/uptime 等快照;Polo 公共 dashboard 展示全网节点/请求统计;1048 个测试(含大量拥塞控制/SACK/重放回归用例 zz__bug_test.go)
结构化 JSON 日志走 slog;pilotctl info/--json 暴露地址/对端/连接/uptime 等快照;Polo 公共 dashboard 展示全网节点/请求统计;1048 个测试(含大量拥塞控制/SACK/重放回归用例 zz__bug_test.go)
README.md:111pkg/daemon/services.go:10<summary><strong>Example JSON output</strong></summary>
```json
$ pilotctl --json info
{"status":"ok","data":{"address":"0:0000.0000.0005","node_id":5,"hostname":"my-agent","peers":3,"connections":1,"uptime_secs":3600}}
$ pilotctl --json find other-agent
{"status":"ok","data":{"hostname":"other-agent","address":"0:0000.0000.0003"}}
$ pilotctl --json recv 1000 --count 1
{"status":"ok","data":{"messages":[{"seq":0,"port":1000,"data":"hello","bytes":5}]}}
$ pilotctl --json find nonexistent Pipecat python BaseObserver 旁路监听 frame 流(on_process_frame/on_push_frame),不改管道;内置 turn/latency/startup observer;PipelineParams(enable_metrics=, enable_usage_metrics=) 收集 token/延迟;OpenTelemetry 追踪经 TurnTraceObserver + utils/tracing/(extra tracing),Sentry 集成
BaseObserver 旁路监听 frame 流(on_process_frame/on_push_frame),不改管道;内置 turn/latency/startup observer;PipelineParams(enable_metrics=, enable_usage_metrics=) 收集 token/延迟;OpenTelemetry 追踪经 TurnTraceObserver + utils/tracing/(extra tracing),Sentry 集成
pipeline/worker.py:135utils/tracing/turn_trace_observer.py:36 self._idle_event.set()
class PipelineParams(BaseModel):
"""Configuration parameters for pipeline execution.
These parameters are usually passed to all frame processors through
StartFrame. For other generic pipeline worker parameters use PipelineWorker
constructor arguments instead.
Parameters:
audio_in_sample_rate: Input audio sample rate in Hz.
audio_out_sample_rate: Output audio sample rate in Hz. PraisonAI python MinimalTelemetry(PostHog 匿名用量,隐私优先) + OpenTelemetry 集成(traces/spans/metrics,README 标注)+ Langfuse tracing(praisonai langfuse);token/cost 收集 (telemetry/token_collector.py);eval/ 做 accuracy/performance/reliability/criteria 评估
MinimalTelemetry(PostHog 匿名用量,隐私优先) + OpenTelemetry 集成(traces/spans/metrics,README 标注)+ Langfuse tracing(praisonai langfuse);token/cost 收集 (telemetry/token_collector.py);eval/ 做 accuracy/performance/reliability/criteria 评估
telemetry/telemetry.py:78 _TELEMETRY_DISABLED_CACHE = not explicitly_enabled
return _TELEMETRY_DISABLED_CACHE
class MinimalTelemetry:
"""
Minimal telemetry collector for anonymous usage tracking.
Privacy guarantees:
- No personal data is collected
- No prompts, responses, or user content is tracked
- Only anonymous metrics about feature usage
- Respects DO_NOT_TRACK standard
- Can be disabled via environment variables Semantic Kernel csharp 内建 OpenTelemetry:KernelFunction 自带 ActivitySource("Microsoft.SemanticKernel") + Meter(invocation/streaming duration histogram);agent 调用经 ModelDiagnostics.StartAgentInvocationActivity;过滤器+结构化日志(LoggerMessage)。评估无内建框架,依赖外部
内建 OpenTelemetry:KernelFunction 自带 ActivitySource("Microsoft.SemanticKernel") + Meter(invocation/streaming duration histogram);agent 调用经 ModelDiagnostics.StartAgentInvocationActivity;过滤器+结构化日志(LoggerMessage)。评估无内建框架,依赖外部
dotnet/src/SemanticKernel.Abstractions/Functions/KernelFunction.cs:41dotnet/src/Agents/Core/ChatCompletionAgent.cs:352 private protected const string MeasurementErrorTagName = "error.type";
/// <summary><see cref="ActivitySource"/> for function-related activities.</summary>
private static readonly ActivitySource s_activitySource = new("Microsoft.SemanticKernel");
/// <summary><see cref="Meter"/> for function-related metrics.</summary>
private protected static readonly Meter s_meter = new("Microsoft.SemanticKernel");
/// <summary>The <see cref="JsonSerializerOptions"/> to use for serialization and deserialization of various aspects of the function.</summary>
private readonly JsonSerializerOptions? _jsonSerializerOptions;
/// <summary>The underlying method, if this function was created from a method.</summary>
#pragma warning disable CA1051 smolagents python Monitor 经 ActionStep callback 累计 token/步时长;AgentLogger(Rich) 分级日志;memory.replay() 回放;return_full_result 返回 RunResult(token_usage/steps/timing/state);telemetry extra 接 OpenTelemetry/Arize Phoenix
Monitor 经 ActionStep callback 累计 token/步时长;AgentLogger(Rich) 分级日志;memory.replay() 回放;return_full_result 返回 RunResult(token_usage/steps/timing/state);telemetry extra 接 OpenTelemetry/Arize Phoenix
monitoring.py:81monitoring.py:100agents.py:196memory.py:248 return f"Timing(start_time={self.start_time}, end_time={self.end_time}, duration={self.duration})"
class Monitor:
def __init__(self, tracked_model, logger):
self.step_durations = []
self.tracked_model = tracked_model
self.logger = logger
self.total_input_token_count = 0
self.total_output_token_count = 0
def get_total_token_counts(self) -> TokenUsage:
return TokenUsage( Strands Agents python 一等公民 OpenTelemetry:Tracer 为 agent/cycle/model/tool 起 span(telemetry/tracer.py:77),EventLoopMetrics 记 token/延迟/cycle,StrandsTelemetry 一键装配;callback_handler 流式回调(默认 PrintingCallbackHandler);评估走 OTEL 导出
一等公民 OpenTelemetry:Tracer 为 agent/cycle/model/tool 起 span(telemetry/tracer.py:77),EventLoopMetrics 记 token/延迟/cycle,StrandsTelemetry 一键装配;callback_handler 流式回调(默认 PrintingCallbackHandler);评估走 OTEL 导出
telemetry/tracer.py:77 return "<replaced>"
class Tracer:
"""Handles OpenTelemetry tracing.
This class provides a simple interface for creating and managing traces,
with support for sending to OTLP endpoints.
When the OTEL_EXPORTER_OTLP_ENDPOINT environment variable is set, traces
are sent to the OTLP endpoint.
Both attributes are controlled by including "gen_ai_latest_experimental", "gen_ai_tool_definitions", Swarm python 仅 debug_print
仅 debug_print
查看 Swarm 完整笔记 →SwarmClaw typescript OpenTelemetry OTLP traces(@opentelemetry/sdk-node,env 配端点/headers);自研 logger/execution-log/activity-log/run-ledger;usage/cost 计量;eval/ 做 baseline+environment-plan 评估;autonomy supervisor 反思每次自治 run
OpenTelemetry OTLP traces(@opentelemetry/sdk-node,env 配端点/headers);自研 logger/execution-log/activity-log/run-ledger;usage/cost 计量;eval/ 做 baseline+environment-plan 评估;autonomy supervisor 反思每次自治 run
observability/otel-config.ts:1export interface OTelConfig {
enabled: true
serviceName: string
tracesEndpoint: string
headers: Record<string, string>
}
function parseBooleanFlag(value: string | undefined): boolean {
if (typeof value !== 'string') return false
const normalized = value.trim().toLowerCase() Swarms python loguru 日志(utils/loguru_logger.py);遥测默认向 swarms.world 上报 agent 数据(SWARMS_TELEMETRY_ON 开关,telemetry/main.py:150);评估类拓扑 council_as_judge/debate_with_judge/majority_voting 充当 LLM-as-judge
loguru 日志(utils/loguru_logger.py);遥测默认向 swarms.world 上报 agent 数据(SWARMS_TELEMETRY_ON 开关,telemetry/main.py:150);评估类拓扑 council_as_judge/debate_with_judge/majority_voting 充当 LLM-as-judge
telemetry/main.py:96telemetry/bootup.py:8 return system_data
def _log_agent_data(data_dict: dict):
"""
Logs agent data and system information to the swarms.world telemetry endpoint via a POST request.
This function is a low-level, internal utility that sends the provided agent state along with current
system telemetry to the Swarms service for analytics and diagnostics. Data includes a timestamp,
comprehensive system information, and the state of the agent as passed in `data_dict`.
Args:
data_dict (dict): Dictionary representing the current agent's state/config/data. Transformers Agents python 步骤日志、verbose 输出;无内建 eval
步骤日志、verbose 输出;无内建 eval
查看 Transformers Agents 完整笔记 →Upsonic python eval/ 子包:AccuracyEvaluator、performance、reliability 三类评测器(.run());可观测经 integrations/ 接 Langfuse / OpenTelemetry(otel extra) / PromptLayer;core 依赖含 sentry-sdk[opentelemetry];pipeline 每步发事件
eval/ 子包:AccuracyEvaluator、performance、reliability 三类评测器(.run());可观测经 integrations/ 接 Langfuse / OpenTelemetry(otel extra) / PromptLayer;core 依赖含 sentry-sdk[opentelemetry];pipeline 每步发事件
src/upsonic/eval/accuracy.py:26 from upsonic.integrations.langfuse import Langfuse
class AccuracyEvaluator:
"""
The main orchestrator for running accuracy evaluations on Upsonic agents,
graphs, or teams using the LLM-as-a-judge pattern.
"""
def __init__(
self,
judge_agent: Agent,
agent_under_test: Union[Agent, Graph, Team], vectara-agentic python 内置 Arize Phoenix(OpenInference instrument LlamaIndex,_observability.py:16 setup_observer),eval_fcs() 把 Vectara FCS 分数作为 span 评估写回(_observability.py:101)。回调 AgentCallbackHandler/agent_progress_callback 实时上报 TOOL_CALL/TOOL_OUTPUT(agent.py:623)。VHC(幻觉纠正) compute_vhc/analyze_hallucinations 是其独特评估能力
内置 Arize Phoenix(OpenInference instrument LlamaIndex,_observability.py:16 setup_observer),eval_fcs() 把 Vectara FCS 分数作为 span 评估写回(_observability.py:101)。回调 AgentCallbackHandler/agent_progress_callback 实时上报 TOOL_CALL/TOOL_OUTPUT(agent.py:623)。VHC(幻觉纠正) compute_vhc/analyze_hallucinations 是其独特评估能力
_observability.py:16_observability.py:101agent_core/utils/hallucination.py:113SPAN_NAME: str = "VectaraQueryEngine._query"
def setup_observer(config: AgentConfig, verbose: bool) -> bool:
"""
Setup the observer.
"""
if config.observer != ObserverType.ARIZE_PHOENIX:
if verbose:
print("No Phoenix observer set.")
return False
try: VoltAgent typescript 核心卖点:全栈 OpenTelemetry,3 个自定义 SpanProcessor——WebSocket(实时推 VoltOps Console)、LocalStorage(本地 trace 存储+查询)、LazyRemoteExport(OTLP→VoltOps/任意后端);零配置默认开启。评估:eval(create-scorer/LLM-judge) + 独立 @voltagent/scorers/@voltagent/evals + langfuse exporter
核心卖点:全栈 OpenTelemetry,3 个自定义 SpanProcessor——WebSocket(实时推 VoltOps Console)、LocalStorage(本地 trace 存储+查询)、LazyRemoteExport(OTLP→VoltOps/任意后端);零配置默认开启。评估:eval(create-scorer/LLM-judge) + 独立 @voltagent/scorers/@voltagent/evals + langfuse exporter
observability/index.ts:1observability/node/volt-agent-observability.ts:31/**
* VoltAgent Observability - Built on OpenTelemetry
*
* This module provides OpenTelemetry-based observability with:
* - WebSocket real-time events via custom SpanProcessor
* - Local storage via custom SpanProcessor
* - OTLP export support
* - Zero-configuration defaults
*/