Grafana Labs used GrafanaCON 2026 in Barcelona to ship a batch of updates that fall into a tidy thesis: observability for AI agents and LLM apps is a real product category, and Grafana wants to be where you instrument it. Several pieces landed together: AI Observability preview in Grafana Cloud, a new CLI that hooks Claude Code and GitHub Copilot into Grafana, the Grafana Assistant now available on the enterprise tier, a rebuilt Loki, and o11y-bench, an open-source agent benchmarking tool.
AI Observability (public preview on Grafana Cloud) instruments agent inputs, outputs, and execution flows, and watches for low-quality responses, policy violations, data exposure, and leaked credentials. The Grafana Cloud CLI (GCX) is the more interesting piece for builders, an agentic interface that lets you invoke the Grafana Assistant directly from Claude Code or GitHub Copilot, pulling observability data into your coding loop instead of forcing a context switch. Grafana Loki has been rebuilt on an Apache Kafka-based architecture, and the numbers Grafana disclosed are concrete: 10× performance improvement on aggregated queries while scanning 20× less data. Grafana Assistant is now on the enterprise tier with a full-screen workspace and an API for third-party tool integration. o11y-bench measures how well AI agents perform real-world tasks against a live Grafana stack.
The Loki rebuild matters quietly. Aggregated log query performance is what makes observability usable at scale, and a 10× speedup with 20× less data scanned is the kind of efficiency claim that directly lowers observability bills for anyone running Grafana Cloud or self-hosted Loki. The Claude Code and GitHub Copilot integration is the more strategic bet. It positions Grafana as the observability plane you query from inside your AI-assisted coding tools rather than a separate UI. Between that, Grafana Assistant reaching enterprise, and o11y-bench as an open-source benchmarking tool, the picture is Grafana trying to own the "where do you observe your AI agents" question before Arize, WhyLabs, Honeycomb's AI features, or hyperscaler-native options lock the category in.
Two things if you operate AI-connected software. One, the agentic-CLI pattern is spreading — Claude Code has its Agent tool, Gemini CLI shipped subagents last week, and now Grafana Cloud CLI hooks into both of the dev CLIs directly. The trend is clear: your observability, your debugging, your ops queries are all going to increasingly happen inside the agent loop rather than in separate tabs, and instrumentation providers are racing to be inside that loop. Two, if you are running Loki at scale, the 10× aggregated-query claim is worth measuring against your actual workload in the next quarter. If the speedup holds on your traffic shape, the savings are real.
