Ruby committer Yusuke Endoh ran Claude Code through 600 test runs implementing a simplified Git across 13 programming languages, revealing that dynamic languages consistently outperformed static alternatives. Ruby averaged $0.36 per run at 73 seconds, Python hit $0.38 at 75 seconds, and JavaScript came in at $0.39 and 81 seconds. Meanwhile, Go cost $0.50 with wild variance (37-second standard deviation), Rust averaged $0.54 with the most test failures, and C bloated to $0.74 by generating 517 lines versus Ruby's 219.
The type system penalty cuts deep into AI coding workflows that many teams are building around. Adding mypy strict checking slowed Python by 1.6-1.7x, while Ruby's Steep type checker imposed a brutal 2.0-3.2x slowdown. TypeScript cost 59% more than JavaScript despite similar line counts, suggesting the model burns extra thinking tokens wrestling with type constraints rather than just generating annotations. This isn't about typing being bad—it's about LLMs struggling with the cognitive overhead of satisfying type systems while generating code.
Endoh's transparent about his Ruby bias and the experiment's limits: 200-line prototypes don't reflect enterprise codebases where static typing pays dividends. Anthropic sponsored the research through their Open Source Program, which provided free Claude access but doesn't invalidate the methodology. The benchmark only measured generation cost and speed, ignoring code quality, maintainability, or bug rates—metrics where static typing might claw back advantages.
For teams evaluating AI coding assistants, this suggests starting with dynamic languages for rapid prototyping, then adding types selectively rather than defaulting to TypeScript or Rust. The 40-60% cost difference adds up fast when you're generating thousands of code snippets monthly." "tags": ["claude", "benchmark", "programming-languages", "code-generation
