Qwen ships Qwen-Scope: 14 SAE suites for steering and analyzing 7 Qwen variants

Qwen Team has released Qwen-Scope, an open-source sparse autoencoder suite that decomposes activations from seven Qwen3 model variants into interpretable features. Fourteen SAE groups in total: dense backbones from 1.7B to 27B (Qwen3-1.7B, Qwen3-8B, Qwen3.5-2B, Qwen3.5-9B, Qwen3.5-27B) plus the Qwen3-30B-A3B and Qwen3.5-35B-A3B MoE models. Weights ship on HuggingFace. This is interpretability tooling shipped as a product, not as a paper appendix.

The training setup uses top-k sparsity (k = 50 or 100) with dictionary expansion at 16× hidden size for dense backbones and 32K-width SAEs for MoE standard configurations, scaling to 128K width (64× expansion) for the wider MoE variants. Qwen3.5-27B SAEs were trained on the instruct variant; the rest target base checkpoints. Documented use cases span four buckets: inference-time feature steering without weight updates, evaluation analysis (detecting benchmark redundancy through feature overlap), data-centric workflows like toxicity classification and safety data synthesis, and post-training signal generation for SFT and RL. The release brings SAE infrastructure into a developer-tool framing — what Goodfire's Ember and Anthropic's prior SAE work proved as research, Qwen is shipping as default tooling for the Qwen ecosystem.

For the open-weight ecosystem this matters more than another model release. Qwen is the dominant open-weight family for downstream fine-tuning; bundling production-grade SAEs with the family makes feature-level intervention a default capability rather than a research project. Steering features at inference is the cleanest path to customizing model behavior without retraining, and tying SAE features to refusal boundaries gives a transparent surface for safety tuning that current RLHF stacks make opaque. The leverage shifts: if you can find the feature that controls a behavior, you stop fighting it through prompts.

If you work on alignment, eval design, or domain-specific adaptation of a Qwen model, pull the SAEs off HuggingFace and start mapping. Look at feature activations on your eval set to find redundancy and contamination. For safety teams, the inference-steering path is now usable with a real toolchain. For research, the MoE SAEs at 128K width are the most interesting artifact — there isn't another open release at this expansion ratio on a frontier-scale MoE.

Qwen ships Qwen-Scope: 14 SAE suites for steering and analyzing 7 Qwen variants

More News