model releaseCohere

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

TL;DR

Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.

2 min read
0

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

Cohere has released Command A+ (command-a-plus-05-2026) as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length with 64K output length, and includes multimodal vision capabilities.

Architecture and Specifications

Command A+ uses a decoder-only Sparse Mixture-of-Experts Transformer architecture with 128 experts, activating 8 per token plus one shared expert applied to all tokens. According to Cohere, the model employs a 3:1 ratio of sliding-window attention layers with Rotational Positional Embeddings to global attention layers without positional embeddings, a design first introduced in the earlier Command A model.

The sparse MoE layer is trained in a "fully dropless manner" using a token-choice router, with additive-bias-based load balancing to distribute token load across experts. The architecture replaces the standard softmax router activation function with a normalized sigmoid over the topk expert logits per token.

Deployment and Hardware Requirements

Cohere provides three quantization options with minimal quality differences:

  • BF16 (16-bit): Requires 4x B200 or 8x H100 GPUs
  • FP8 (8-bit): Requires 2x B200 or 4x H100 GPUs
  • W4A4 (4-bit): Requires 1x B200 or 2x H100 GPUs

Cohere recommends the W4A4 quantization for most use cases, claiming "superior speed and latency characteristics alongside a smaller hardware footprint."

Capabilities

The model supports 48 languages including English, Chinese, Japanese, Arabic, Spanish, and various European and Asian languages. It includes native tool use capabilities trained for conversational API interactions, with support for JSON schema tool descriptions and citation generation to ground responses in specific tool results.

Command A+ includes a reasoning mode that generates explicit thinking steps between <START_THINKING> and <END_THINKING> tags before producing final outputs. The model also accepts image inputs for multimodal processing.

Integration

The model requires transformers installation from source and is compatible with vLLM 0.21.0 or higher. Tool calling and reasoning parsing require Cohere's melody library (version 0.9.0+). The model is available on Hugging Face with a hosted demo space for testing before deployment.

What This Means

Command A+ enters the competitive open source model space with a sparse MoE design similar to Mixtral and DeepSeek's architectures, but with significantly more total parameters (218B vs Mixtral 8x22B's 141B). The 128K context window matches GPT-4 Turbo and Claude 3 capabilities, while the Apache 2.0 license allows unrestricted commercial use. The model's combination of vision, reasoning, and tool use in a single open source package targets enterprise deployments that previously required closed-source API providers.

Related Articles

model release

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model release

DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3

DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.

model release

Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese

Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.

model release

DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes

DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.

Comments

Loading...