GitHub cut token spend in its agentic CI workflows by up to 62% and shipped the methodology in the gh-aw CLI — and the techniques are reusable by any team running LLM agents in CI/CD. The headline finding is the one most builders are paying for without measuring: an MCP server exposing 40 tools adds 10-15KB of schema per turn, whether or not the agent uses those tools, and pruning unused entries cut 8-12KB per call in their smoke-test workflows. Every tool you wire into an agent costs context tokens on every single turn. Concrete results: Auto-Triage Issues 62% Effective-Tokens reduction, Smoke Claude 59%, Security Guard 43%.

The second technique is CLI substitution: GitHub replaced MCP calls for fetching PR diffs and file contents with gh CLI commands, either pre-downloaded into workspace files or proxied through an HTTP gateway that keeps authentication away from the agent. MCP is a clean protocol, but for high-frequency deterministic fetches it pays a per-call schema-and-envelope tax that a CLI call avoids. The measurement framework is the most portable idea: an Effective Tokens (ET) metric that weights output tokens 4×, cache reads 0.1×, then applies a model multiplier — Haiku 0.25×, Sonnet 1.0×, Opus 5.0× — so a single number compares cost across models and catches regressions. Token data is captured in a token-usage.jsonl artifact across CLI tools, and two agents run the loop: a Daily Token Usage Auditor that aggregates by workflow and flags expensive jobs, and a Daily Token Optimiser that reads source and logs, opens a GitHub issue, and proposes specific fixes.

The ecosystem read: this is the cost side of the productivity-attribution problem that Uber's COO flagged — you cannot prove the value link if you cannot measure the spend, and GitHub just published a rigorous way to measure the spend. The MCP-schema-bloat finding deserves the most attention because the agent ecosystem has been adding MCP servers enthusiastically without accounting for the per-turn context cost of tool definitions — a 40-tool server is a standing tax on every inference, and most teams have never looked. The Effective-Tokens weighting is GitHub's own (the 4×/0.1×/5× numbers are choices, not a standard), but the idea of a single normalized cost metric that survives model swaps is exactly the unit-economics instrumentation enterprises have lacked. The auditor/optimiser agent loop is also a clean self-referential pattern: agents optimizing agent cost, with a human-reviewable GitHub issue as the output.

If you run agents in CI Monday morning: audit your MCP tool list first — prune anything the workflow does not call, because you are paying for every schema every turn. Then consider gh-CLI-style substitution for your highest-frequency deterministic fetches, and instrument a token-usage.jsonl plus an Effective-Tokens-style metric so model swaps and prompt changes show up as cost deltas you can see. The gh-aw CLI is the reference implementation; the methodology is the transferable part.