Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage

TL;DR

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.

June 12, 2026 · 12:05 PM2 min read

Kimi K2.7 Code — Quick Specs

Context window256K tokens

Input$0.82/1M tokens

Output$3.75/1M tokens

Compare Kimi K2.7 Code with other models →

Moonshot AI Releases Kimi K2.7 Code with 1T Parameters

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts (MoE) model built on the K2.6 architecture. The model features 32 billion activated parameters, a 256K token context window, and reduces thinking token usage by approximately 30% compared to K2.6.

Architecture and Specifications

Kimi K2.7 Code uses a 384-expert MoE architecture with 8 experts selected per token, plus 1 shared expert. The model includes 61 total layers (including 1 dense layer), 64 attention heads, and a 160K vocabulary size. It employs Multi-head Latent Attention (MLA) with SwiGLU activation and integrates MoonViT, a 400M parameter vision encoder, for multimodal capabilities supporting image and video input.

The model is available in native INT4 quantization using the same method as Kimi-K2-Thinking.

Benchmark Performance

According to Moonshot AI, K2.7 Code shows substantial improvements across coding and agentic benchmarks:

Coding benchmarks:

Kimi Code Bench v2: 62.0 (vs 50.9 for K2.6)
Program Bench: 53.6 (vs 48.3 for K2.6)
MLS Bench Lite: 35.1 (vs 26.7 for K2.6)

Agentic benchmarks:

Kimi Claw 24/7 Bench: 46.9 (vs 42.9 for K2.6)
MCP Atlas: 76.0 (vs 69.4 for K2.6)
MCP Mark Verified: 81.1 (vs 72.8 for K2.6)

Moonshot AI compared K2.7 Code against GPT-5.5 and Claude Opus 4.8, though these comparison scores cannot be independently verified. In Moonshot's testing, GPT-5.5 led most benchmarks, with K2.7 Code placing second or third depending on the task.

Deployment and Availability

The model API is available at platform.moonshot.ai with OpenAI and Anthropic-compatible interfaces. Pricing per million tokens has not been disclosed. The model requires transformers version 4.57.1 or higher (but below 5.0.0) and can be deployed using vLLM, SGLang, or KTransformers inference engines.

Kimi K2.7 Code operates exclusively in thinking mode with preserve_thinking forced to True. Moonshot AI recommends a temperature of 1.0 and top_p of 0.95 for inference. Video content chat is currently experimental and only supported through the official API.

What This Means

Kimi K2.7 Code represents Moonshot AI's push into specialized coding models with extended reasoning capabilities. The 30% reduction in thinking token usage addresses a practical efficiency concern for reasoning models in production environments. The 256K context window positions it competitively for repository-level code understanding tasks, though it remains shorter than some competitors offering 1M+ token windows. The model's MoE architecture with 384 experts and INT4 quantization support suggests Moonshot is optimizing for deployment efficiency alongside raw capability.

Source: huggingface.co ↗

Moonshot AI Kimi K2.7 Code model release coding model MoE reasoning model multimodal Mixture-of-Experts

model releaseJuly 27, 2026

Moonshot AI Releases Kimi K3: Open-Weight 2.8T-Parameter Model With 1M-Token Context and Native Multimodality

Moonshot AI has released Kimi K3, an open-weight 2.8-trillion-parameter mixture-of-experts model with 104B activated parameters, a 1,048,576-token context window, and native multimodal support. The company describes it as the world's first open 3T-class model, built on a new Kimi Delta Attention architecture.

model releaseJuly 25, 2026

Microsoft Releases Fara1.5-27B, a 27B Vision-Only Web Browsing Agent with 262K Context

Microsoft Research AI Frontiers has released Fara1.5-27B, a 27-billion-parameter multimodal agent that completes web tasks by reading screenshots and emitting click/type/scroll commands. The model, fine-tuned from Qwen3.5-27B, ships under MIT license with a 262K-token context window and is designed to run alongside Microsoft's MagenticLite sandbox.

model releaseJuly 25, 2026

Anthropic Ships Claude Opus 5, Claims Near-Fable Performance at Half the Price

Anthropic released Claude Opus 5 on July 24, 2026, positioning it as a lower-cost alternative to its more expensive Claude Fable 5 model. Independent evaluators Epoch AI and Artificial Analysis report mixed but largely favorable results, with Opus 5 nearly matching Fable 5 on coding benchmarks while cutting cost-per-task by roughly 20%.

model releaseJuly 24, 2026

Anthropic Ships Claude Opus 5, Claims It Matches Flagship Fable 5 on Coding at Half the Cost

Anthropic released Claude Opus 5 on July 24, its fourth model launch in under two months, priced at $5 per million input tokens and $25 per million output tokens. The company claims the model matches or beats its flagship Fable 5 on most coding and knowledge-work benchmarks while posting the lowest deception rate of any model it has shipped.