Poolside releases Laguna XS.2: 33B parameter MoE coding model with 131K context window
Poolside has released Laguna XS.2, a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token, designed for agentic coding. The model features a 131,072-token context window, scores 68.2% on SWE-bench Verified, and is available under Apache 2.0 license with free API access.
Poolside releases Laguna XS.2: 33B parameter MoE coding model with 131K context window
Poolside has released Laguna XS.2, a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token, designed for agentic coding and long-horizon work on local machines.
Model specifications
Laguna XS.2 uses Sliding Window Attention with per-head gating in 30 of its 40 layers. The architecture includes:
- Total parameters: 33B with 3B activated per token
- Context window: 131,072 tokens
- Experts: 256 experts with 1 shared expert
- Architecture: 40 layers (10 global attention, 30 sliding window attention)
- Sliding window: 512 tokens
- Training: Muon optimizer, includes pre-training, post-training, and reinforcement learning stages
- License: Apache 2.0
The model uses FP8-quantized KV cache to reduce memory requirements and supports native reasoning with interleaved thinking between tool calls.
Benchmark performance
According to Poolside, Laguna XS.2 achieves the following scores:
- SWE-bench Verified: 68.2% (mean pass@1 over 4 runs)
- SWE-bench Multilingual: 62.4% (mean pass@1 over 7 runs)
- SWE-bench Pro: 44.5% (mean pass@1 over 3 runs)
- Terminal-Bench 2.0: 30.1% (mean pass@1 over 5 runs)
Poolside claims these scores place it competitively with Devstral Small 2 (24B dense, 68.0% on SWE-bench Verified) while using fewer activated parameters. Qwen3.6-35B-A3B leads the comparison group at 73.4% on SWE-bench Verified.
All benchmarking was completed using the Laude Institute's Harbor Framework with temperature=0.7 and top_k=20. The company notes some task images and verifiers were patched to fix infrastructure reliability issues.
Availability and deployment
Poolside is offering free API access to Laguna XS.2 and its larger 225B model, Laguna M.1, for a limited time. Pricing details have not been disclosed.
The model has launch-day support in:
- vLLM (pending PR merge)
- Transformers (support merged, shipping in release after v5.6.2)
- TRT-LLM (pending upstream PR)
- Ollama with MLX support
The company has released pool, a lightweight terminal-based coding agent that works with the model. Laguna XS.2 can run on a Mac with 36 GB of RAM according to Poolside.
What this means
Laguna XS.2 represents a push toward locally-runnable coding models with competitive benchmark performance. The 33B total parameter count with 3B activation makes it accessible for developers with high-end consumer hardware, while the Apache 2.0 license removes commercial restrictions. The 131K context window matches or exceeds many commercial models, potentially enabling longer coding sessions without context truncation. However, its 68.2% SWE-bench Verified score trails the latest Qwen and Claude models, suggesting it may be best suited for local development workflows where privacy and control outweigh absolute performance.
Related Articles
Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage
Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.
Google DeepMind releases DiffusionGemma, a 26B parameter model generating 15-20 tokens per forward pass via discrete dif
Google DeepMind released DiffusionGemma, a 26B parameter mixture-of-experts model that generates text using discrete diffusion instead of autoregression. The model processes blocks of 256 tokens in parallel, achieving generation speeds exceeding 1100 tokens per second on H100 GPUs in low-batch settings.
Cohere Releases North Mini Code 1.0: 30B-Parameter MoE Model With 256K Context for Agentic Coding
Cohere Labs has released North Mini Code 1.0, a 30B-parameter sparse Mixture-of-Experts model with 3B active parameters and a 256K context window. The Apache 2.0-licensed model is optimized for agentic software engineering, featuring 128 experts with 8 activated per token, and trained specifically for tool use in coding tasks.
Google DeepMind releases Gemma 4 12B: encoder-free multimodal model runs on 16GB RAM
Google DeepMind has released Gemma 4 12B, a 12-billion parameter multimodal model that runs locally on laptops with 16GB of RAM. The model eliminates separate vision and audio encoders, processing raw inputs directly through its language model backbone under an Apache 2.0 license.
Comments
Loading...