Moonshot AI Releases Kimi K2.6: 1T-Parameter MoE Model with 256K Context and Agent Swarm Capabilities
Moonshot AI has released Kimi K2.6, an open-source multimodal model with 1 trillion total parameters (32B activated) and 256K context window. The model achieves 80.2% on SWE-Bench Verified, 58.6% on SWE-Bench Pro, and supports horizontal scaling to 300 sub-agents executing 4,000 coordinated steps.
Moonshot AI Releases Kimi K2.6: 1T-Parameter MoE Model with 256K Context and Agent Swarm Capabilities
Moonshot AI has released Kimi K2.6, an open-source multimodal model with 1 trillion total parameters and 32 billion activated parameters per forward pass. The model supports a 256K token context window and is designed for long-horizon coding, autonomous agent orchestration, and coding-driven design tasks.
Architecture and Specifications
Kimi K2.6 uses a Mixture-of-Experts (MoE) architecture with 384 total experts, selecting 8 experts per token plus 1 shared expert. The model features:
- 61 total layers (including 1 dense layer)
- 7,168 attention hidden dimension
- 2,048 MoE hidden dimension per expert
- 64 attention heads
- 160K vocabulary size
- Multi-Latent Attention (MLA) mechanism
- SwiGLU activation function
- MoonViT vision encoder with 400M parameters
The model is available with native INT4 quantization and can be deployed on vLLM, SGLang, and KTransformers inference engines.
Benchmark Performance
On coding benchmarks, Kimi K2.6 achieves 80.2% on SWE-Bench Verified (averaged over 10 runs), 58.6% on SWE-Bench Pro, and 76.7% on SWE-Bench Multilingual. The model scores 66.7% on Terminal-Bench 2.0 and 89.6% on LiveCodeBench v6.
For agentic tasks with tool use, the model reaches 54.0% on HLE-Full (compared to 52.1% for GPT-5.4 and 53.0% for Claude Opus 4.6). On BrowseComp, it scores 83.2% in single-agent mode and 86.3% using agent swarm capabilities. For deep research tasks, Kimi K2.6 achieves 92.5% F1-score and 83.0% accuracy on DeepSearchQA.
On reasoning benchmarks, the model scores 96.4% on AIME 2026, 92.7% on HMMT 2026, and 90.5% on GPQA-Diamond. Vision-language performance includes 79.4% on MMMU-Pro (80.1% with Python tool use) and 87.4% on MathVision (93.2% with Python).
Agent Swarm Architecture
According to Moonshot AI, Kimi K2.6 can scale horizontally to 300 sub-agents executing 4,000 coordinated steps. The system dynamically decomposes tasks into parallel, domain-specialized subtasks and can generate end-to-end outputs including documents, websites, and spreadsheets in autonomous runs. The company claims the model supports persistent, 24/7 background agents for proactive task management.
Availability and API
Kimi K2.6 is available through Moonshot AI's API platform at platform.moonshot.ai with OpenAI and Anthropic-compatible APIs. Pricing has not been disclosed. The model supports two modes: Thinking mode (recommended temperature 1.0) and Instant mode (recommended temperature 0.6), both with top_p of 0.95.
The model requires transformers version >=4.57.1, <5.0.0 for deployment. Video content chat is currently an experimental feature available only through the official API.
What This Means
Kimi K2.6 represents a significant architectural approach to scaling agent capabilities through horizontal swarm orchestration rather than just vertical reasoning depth. The 80.2% SWE-Bench Verified score places it competitively with frontier models, though its real differentiation appears in multi-agent coordination benchmarks where it shows gains of 8-10 percentage points in swarm mode versus single-agent operation. The 256K context window and native support for 4,000-step execution traces suggest the model is optimized for complex, long-running autonomous workflows rather than single-shot inference tasks.
Related Articles
NVIDIA Releases Nemotron 3.5 Content Safety: 4B-Parameter Multimodal Model with Custom Policy Enforcement and 140-Langua
NVIDIA has released Nemotron 3.5 Content Safety, a 4B-parameter model built on Google Gemma 3 4B IT that provides multimodal safety classification across approximately 140 languages. The model includes a 128K context window, custom enterprise policy enforcement, auditable reasoning traces, and is releasing its training dataset.
NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning
NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.
NVIDIA Nemotron 3 Ultra launches on AWS SageMaker with 550B parameters, 1M token context window
NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart with 550 billion total parameters and 55 billion active parameters. The model features a hybrid Transformer-Mamba Mixture-of-Experts architecture and supports context windows up to 1 million tokens, targeting agentic AI workloads.
Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context
Nvidia has released Nemotron 3.5 Content Safety, a 4-billion parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. The model is available for free, supports 128K token context windows, and moderates content across 12 languages.
Comments
Loading...