Cohere Releases North Mini Code 1.0: 30B-Parameter MoE Model With 256K Context for Agentic Coding
Cohere Labs has released North Mini Code 1.0, a 30B-parameter sparse Mixture-of-Experts model with 3B active parameters and a 256K context window. The Apache 2.0-licensed model is optimized for agentic software engineering, featuring 128 experts with 8 activated per token, and trained specifically for tool use in coding tasks.
North Mini Code 1.0 — Quick Specs
Cohere Releases North Mini Code 1.0: 30B-Parameter MoE Model With 256K Context for Agentic Coding
Cohere Labs has released North Mini Code 1.0, a 30B-parameter sparse Mixture-of-Experts (MoE) model with 3B active parameters designed for code generation and agentic software engineering tasks.
Model Architecture and Specifications
North Mini Code 1.0 uses a decoder-only Transformer architecture with 128 experts, activating 8 per token. The model features:
- Total parameters: 30B (3B active)
- Context window: 256K tokens with 64K max output
- License: Apache 2.0
- Architecture: Sparse MoE with interleaved attention (3:1 ratio of sliding-window with RoPE to global attention without positional embeddings)
- Training: Two-stage post-training with supervised fine-tuning (SFT) followed by reinforcement learning with verifiable rewards (RLVR)
Performance on Agentic Coding Benchmarks
Cohere evaluated the model on SWE-Bench Verified, SWE-Bench Pro, Terminal-Bench v2, and Terminal-Bench Hard using the Swe-Agent harness v1.1.0. The company also tested on SciCode and LiveCodeBench v6 for complex code generation. All benchmarks used temperature=1.0 and top_p=0.95 across 3 seeds. Specific benchmark scores were not disclosed in the model card.
Tool Use and Integration
The model supports native tool-use capabilities through chat templates in Transformers. According to Cohere, North Mini Code 1.0 features "interleaved thinking" where the model generates reasoning content alongside tool calls. The company recommends passing all model-generated thinking content to future agentic steps for optimal performance.
Integration requires installing Transformers from source and, for vLLM deployment, using the main branch with Cohere's melody library (version 0.9.0+). The model uses a bash tool for terminal command execution.
Availability
North Mini Code 1.0 is available on Hugging Face and can be tested in OpenCode and Cohere's hosted Hugging Face Space. The model requires tensor parallelism (recommended -tp 2) for vLLM serving with a max model length of 320,000 tokens.
What This Means
This release targets the growing market for AI-powered software engineering tools, competing with models like GitHub Copilot and Amazon CodeWhisperer. The 30B-parameter count with 3B active parameters via sparse MoE suggests Cohere is prioritizing inference efficiency over raw model size—a practical choice for deployment in development environments. The Apache 2.0 license and open weights make this accessible for commercial use, though the lack of disclosed pricing and specific benchmark comparisons to competing models leaves performance questions open. The emphasis on agentic capabilities and tool use reflects the industry shift from simple code completion to multi-step reasoning and execution workflows.
Related Articles
Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%
Microsoft released FastContext-1.0, a lightweight repository-exploration subagent for LLM coding agents spanning 4B to 30B parameters. The model reduced main-agent token consumption by up to 60% while improving end-to-end resolution rates by up to 5.5% on SWE-bench Pro when integrated with agents like GPT-5.4 and GLM-5.1.
Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage
Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.
GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License
Zhipu AI has released GLM-5.2, a 753 billion parameter model featuring a 1 million token context window and MIT open-source license. The model scores 62.1% on SWE-bench Pro and 91.2% on GPQA-Diamond, with flexible reasoning effort levels for coding tasks.
Z.ai Releases GLM-5.2 with 1M Token Context Window at $1.40/$4.40 per Million
Z.ai has released GLM-5.2, a model designed for long-horizon engineering tasks with a 1 million token context window. The model is priced at $1.40 per million input tokens and $4.40 per million output tokens, and was released on June 16, 2025.
Comments
Loading...