Alibaba's Qwen team launched Qwen3.7-Plus on Bailian (Alibaba Cloud's Model Studio for international developers) today, the multimodal sibling of the text-only Qwen3.7-Max that landed in May. Capabilities listed: image and video understanding (reads, does not generate), deep reasoning, tool invocation, self-programming, verification and testing, and "autonomous iteration" (Alibaba's framing for sustained agent loops). 1M token context window. The concrete agentic claim worth flagging: "35-hour autonomous run without measurable degradation, chaining over 1,000 tool calls in a single session." API-only at release; Plus is committed to open weights (Max stays proprietary), no specific timeline yet, no HuggingFace presence at publication.
Parameter count not disclosed. Architecture (dense vs MoE) not disclosed. The "deep reasoning" mechanism isn't detailed: no mention of a thinking-mode toggle like the previous Qwen3-Max-Thinking line, no cost multiplier disclosed. Tool invocation = function calling at the basic level; MCP support not confirmed. Vision Arena ranking is #16 overall (Alibaba #5 lab globally), solid but not frontier-SOTA. The sibling Qwen3.7-Max scored 56.6 on Artificial Analysis Intelligence Index v4.0 (5th overall, #1 Chinese model), 50.8% on Terminal-Bench Hard, 92.4 on GPQA Diamond (edging Claude Opus-4.6's 91.3), with the lowest hallucination rate among frontier models at 22.9%. Those are Max numbers, not Plus. Bailian adds an "Agentic RL" layer that uses real-world execution feedback to refine accuracy over time, a platform-level continual-learning feature that operates above the base model. The 35-hour-1000-tool-call demo is vendor-published with no harness disclosure and no third-party reproduction yet.
Two threads worth tracking. First, the open-weights tier split. Alibaba is making Plus open and keeping Max proprietary, mirroring the pattern DeepSeek established and that MiniMax M3 just doubled down on (open weights promised within 10 days). The Chinese-lab open-weights versus Western-lab proprietary-frontier dynamic continues to sharpen, with each release pushing the "fully open frontier" line a bit further. Second, the agentic framing. "Autonomous iteration" is Alibaba's rebrand of what is functionally a ReAct-style multi-turn tool-use loop, but the duration claim (35 hours, 1000+ tool calls) is the operational frontier number. If reproducible, it changes what kinds of long-running agents are economically viable. Independent verification is the gap: no harness disclosure, no third-party reproduction in published material. The Bailian Agentic RL platform feature (execution-feedback fine-tuning during deployment) is the substantive platform-level claim that goes beyond model capabilities, continual learning from real production traces, which most agent platforms talk about and almost none actually ship.
Monday morning, if you're shipping long-running agents and have access to Bailian: Qwen3.7-Plus is worth integrating today specifically to test the long-tool-run durability claim. Run your own multi-hour task with concrete tool counts and measure where degradation actually sets in versus the vendor 35-hour number. If you're not on Bailian and don't want a cloud-API agent dependency, the open-weights drop is the event to wait for; until then, this is a vendor-platform story. If you're evaluating Chinese-lab open weights for your stack, watch the Plus open release alongside MiniMax M3's promised 10-day weights drop, both will likely land in the same window and the comparison will matter for which one belongs in your inference fleet. And if you're building a continual-learning platform yourself, the Bailian Agentic RL claim is the design pattern to study, vendor description is thin but the framing (real-world execution feedback as RL signal) is the right shape.
