Poolside AI released its Laguna model family on April 28, with two flagship models: Laguna M.1 (225B total / 23B activated MoE, closed-weight) and Laguna XS.2 (33B total / 3B activated, open-weight). The headline numbers are SWE-bench Verified scores of 72.5% for M.1 and 68.2% for XS.2, putting both in the same league as closed frontier coding models. The release also includes "pool" — Poolside's internal terminal-based coding agent and dual Agent Client Protocol (ACP) client-server, available as a research preview. The killer detail for builders: XS.2 is compact enough to run on a Mac with 36 GB of RAM via Ollama.
The architecture choices in XS.2 are worth reading. It is a Mixture-of-Experts model with 256 experts plus 1 shared expert; only 3B parameters are activated per token despite 33B total. The attention layout is 30 Sliding Window Attention layers (512-token window) interleaved with 10 global-attention layers at a 3:1 ratio across 40 total layers — that drops KV cache memory dramatically without losing long-range dependencies. The KV cache is FP8-quantized for further memory reduction. Sigmoid gating with per-layer rotary scales drives the SWA/global mix. Context window is 131,072 tokens, with native interleaved thinking between tool calls and per-request enable/disable of reasoning. Laguna M.1, the parent model, was trained from scratch on 30 trillion tokens using 6,144 NVIDIA Hopper GPUs, finishing pre-training at the end of last year. Poolside is also releasing Laguna XS.2-base for practitioners who want to fine-tune.
Two patterns matter. First, the gap between open-weight and closed-weight coding models just got meaningfully smaller. 68.2% on SWE-bench Verified for an open-weight 33B/3B-active model is competitive with closed-weight models at equivalent scale, and local-Mac runnability removes one of the main reasons to use a closed API for coding tasks: latency. Builders who want their agent to run inside their development environment without a network round-trip now have a benchmark-competitive option. Second, the architecture of XS.2 looks like the consolidated 2026 efficient-inference playbook: MoE for headroom-without-cost, mixed SWA + global attention for long context, FP8 KV cache for memory, native interleaved reasoning. Anyone shipping their own efficient inference stack should treat this configuration as the current reference target.
For builders, three concrete things. First, XS.2 plus Ollama on a 36 GB Mac is the right benchmark to actually run before committing to a closed coding API for your use case. The latency, privacy, and cost picture is different enough that the comparison is not trivially in favor of frontier closed models anymore. Second, the "pool" agent and Agent Client Protocol release is worth studying if you build your own agent harness. ACP as a name is generic enough that we may see other vendors converge around it; whether or not Poolside's specific protocol becomes a standard, the pattern of separating the agent-driver from the model is the right architecture. Third, the SWA/global 3:1 ratio with 512-token windows in XS.2 is a tunable choice other open-weight teams will likely copy. Watch for similar configurations in Mistral and Qwen successors over the next several months — the design space for efficient long-context attention is converging fast.
