GitHub Copilot is moving to per-token billing on June 1, 2026, ending the flat-rate "Premium Requests" model that bundled coding-agent calls into a fixed monthly allowance. Under the new structure, Copilot Pro at $10/month converts into 1,000 AI Credits, where one credit equals one US cent at current rates. Code completions and Next Edit suggestions remain free; everything else — Chat, agent runs, model selection — gets metered.
The shift matters because the old model was abused at both ends. A multi-hour autonomous-agent task counted as one Premium Request; a single trivial Q&A counted the same. The new model pegs cost to actual token volume — input, output, cache hits, and feature type — with rates that vary by model. GitHub has not published the per-model rate card, but the implication is straightforward: Claude-Sonnet-class agent runs cost more per credit than Haiku-class completions, and any team that built workflows around the "Premium Request" abstraction now needs to model token spend instead of seat spend. Effective date is June 1, 2026.
This is the wrapper economy catching up to its own infrastructure. Flat-rate dev tools were viable when API costs were heavily subsidized by VC and when the agent stack was thin. Reasoning models and long-running agent traces broke that math — a coding agent doing 100K-token rollouts on a complex refactor is genuinely a different cost shape than a 200-token completion. GitHub passing the costs through is honest pricing; it's also a competitive opening for tools that offer flat-rate or self-hosted alternatives, because every dev shop with a Copilot line item just got a budget surprise.
If you ship code with Copilot in your stack, instrument before June 1: log Chat, agent, and tool-use volumes to estimate token spend at the new rates. If you cannot get a clean estimate from GitHub, set a hard credit cap per developer for the first month and adjust. Self-hosted Continue with open-weight models is a credible alternative for shops with the operations capacity. For everyone else, the answer is the same as it always is when a per-call meter shows up: measure first, optimize the loops that move the bill.
