AI Observability: Definition & Meaning — AI Wiki

即時監控和理解生產中 AI 系統的行為 — 追蹤輸入、輸出、延遲、成本、錯誤、品質指標。AI 可觀測性像應用監控(Datadog、New Relic)但專為 AI 特化:追蹤 prompt-回應對、偵測品質退化、監控幻覺、對異常行為告警。

為什麼重要

沒有可觀測性部署 AI 系統就是盲飛。你不知道模型是不是比平常更多幻覺、延遲是不是在爬升、某種查詢是不是在失敗、成本是不是在飆升。AI 可觀測性把「看起來能運作」變成「我們知道它運作,我們知道它什麼時候不運作」。這是 demo 和生產系統的區別。

Deep Dive

Core observability signals for AI: request/response logs (what did users ask, what did the model respond), latency metrics (TTFT, tokens per second, total response time), cost tracking (tokens consumed, API spend), quality metrics (user feedback, automated quality scores), error rates (API failures, rate limits, content filter triggers), and safety metrics (refusal rates, flagged content, prompt injection attempts).

Tracing

For complex AI applications (RAG pipelines, multi-agent systems), tracing follows a request through every step: the user query, the retrieval results, the prompt construction, the model call, the post-processing, and the final response. Each step is logged with inputs, outputs, latency, and cost. When something goes wrong, traces let you identify exactly where in the pipeline the failure occurred. LangSmith, Langfuse, and Braintrust provide LLM-specific tracing.

Quality Monitoring

The hardest part of AI observability: automatically detecting when output quality degrades. Approaches include: LLM-as-judge (use a model to score outputs), embedding drift detection (if the distribution of outputs changes significantly, something may be wrong), user feedback signals (thumbs up/down, regeneration rates), and regression testing (periodically run a golden set of queries and compare outputs to baselines). No single approach catches everything — production systems use multiple signals.

AI Observability

為什麼重要

Deep Dive

Tracing

Quality Monitoring

相關概念