Nous Research's Hermes Agent has crossed 140,000 GitHub stars in under three months and is, per NVIDIA citing OpenRouter, the most-used agent in the world as of last week. NVIDIA's blog post Wednesday positions Hermes as the local-hardware counterpart to the hosted agent stacks shipping out of AWS, Google and Anthropic, with optimization for RTX PCs, RTX PRO workstations and the DGX Spark personal AI box (128GB unified memory, 1 petaflop AI performance). Four design choices distinguish Hermes from the wrapper tier: self-evolving skills (the agent writes and refines its own skill set across runs), contained sub-agents (short-lived isolated workers with focused tool scope, which keeps context windows small enough to run on local models), Nous-curated reliability (every shipped skill, tool and plug-in is stress-tested before release), and "active orchestration" framing — Hermes positions itself as a runtime, not a thin shim over the model.
The model side of the story is Qwen 3.6, Alibaba's just-released open-weight family. NVIDIA claims the new 35B model outperforms the previous-generation 120B-parameter models while running on roughly 20GB of memory (versus 70GB+ for the 120B class), and that the new Qwen 3.6 27B dense model matches the accuracy of Qwen 3.5's 397B at one-sixteenth the size. Both claims are load-bearing for the "you can run this locally" narrative and want third-party harness verification — NVIDIA's marketing copy doesn't disclose which evals these comparisons rest on, and capability-per-parameter compression claims have a track record of softening when independent benchmarks land. Treat the underlying ratio (35B at 120B-class performance) as the testable hypothesis, not the verified result, until OpenLLM or LMSYS confirms.
The ecosystem read here is the local-stack counter-thesis to everything else shipping this week. AWS WorkSpaces gave agents hosted virtual desktops; Google's Gemini pointer keeps agents in the cloud and follows the human cursor; Microsoft's MDASH is enterprise-only and SaaS-delivered. Hermes is the opposite — model-agnostic, provider-agnostic, runs out-of-box with LM Studio and Ollama via llama.cpp, designed for an always-on local agent on a workstation under someone's desk. NVIDIA's strategic interest is obvious (selling more RTX PRO and DGX Spark units) but the underlying pattern is genuinely independent of vendor: enough capability has compressed into 30B-class open weights that the workflow "an agent runs all day on my hardware, refining its own skills, calling into my local tools" is now mechanically possible. The OpenRouter ranking, if it holds up, is the first real evidence that a non-vendor open-source agent has won the developer share-of-mind battle against Claude Code, Codex and the closed agents.
For builders: clone the Hermes GitHub repo, pair it with Qwen 3.6 27B or 35B via Ollama or LM Studio, and benchmark it on your actual workflow before trusting either claim. Two things to watch: (1) whether independent evals confirm the Qwen 3.6 27B-matches-397B compression — that is the load-bearing engineering claim of the entire stack; (2) whether Hermes's self-evolving skills actually accumulate useful capability across runs, or drift in the way that earlier self-improving agent attempts have. The provider/model-agnostic design is what makes Hermes interesting beyond the NVIDIA pairing — if Qwen 3.6 disappoints, you swap in Llama 4 or Mistral Large and the agent layer stays. The pattern is the news; the specific hardware bundle is the marketing layer.
