model releaseCohere

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

TL;DR

Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.

2 min read
0

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

Cohere has released Command A+ (command-a-plus-05-2026) as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length with 64K output length, and includes multimodal vision capabilities.

Architecture and Specifications

Command A+ uses a decoder-only Sparse Mixture-of-Experts Transformer architecture with 128 experts, activating 8 per token plus one shared expert applied to all tokens. According to Cohere, the model employs a 3:1 ratio of sliding-window attention layers with Rotational Positional Embeddings to global attention layers without positional embeddings, a design first introduced in the earlier Command A model.

The sparse MoE layer is trained in a "fully dropless manner" using a token-choice router, with additive-bias-based load balancing to distribute token load across experts. The architecture replaces the standard softmax router activation function with a normalized sigmoid over the topk expert logits per token.

Deployment and Hardware Requirements

Cohere provides three quantization options with minimal quality differences:

  • BF16 (16-bit): Requires 4x B200 or 8x H100 GPUs
  • FP8 (8-bit): Requires 2x B200 or 4x H100 GPUs
  • W4A4 (4-bit): Requires 1x B200 or 2x H100 GPUs

Cohere recommends the W4A4 quantization for most use cases, claiming "superior speed and latency characteristics alongside a smaller hardware footprint."

Capabilities

The model supports 48 languages including English, Chinese, Japanese, Arabic, Spanish, and various European and Asian languages. It includes native tool use capabilities trained for conversational API interactions, with support for JSON schema tool descriptions and citation generation to ground responses in specific tool results.

Command A+ includes a reasoning mode that generates explicit thinking steps between <START_THINKING> and <END_THINKING> tags before producing final outputs. The model also accepts image inputs for multimodal processing.

Integration

The model requires transformers installation from source and is compatible with vLLM 0.21.0 or higher. Tool calling and reasoning parsing require Cohere's melody library (version 0.9.0+). The model is available on Hugging Face with a hosted demo space for testing before deployment.

What This Means

Command A+ enters the competitive open source model space with a sparse MoE design similar to Mixtral and DeepSeek's architectures, but with significantly more total parameters (218B vs Mixtral 8x22B's 141B). The 128K context window matches GPT-4 Turbo and Claude 3 capabilities, while the Apache 2.0 license allows unrestricted commercial use. The model's combination of vision, reasoning, and tool use in a single open source package targets enterprise deployments that previously required closed-source API providers.

Related Articles

model release

Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU

Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.

model release

IBM Releases 97M-Parameter Granite Embedding Model With 60.3 MTEB Score — Highest Retrieval Quality Under 100M Parameter

IBM released two new multilingual embedding models under Apache 2.0: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval (highest in its size class) and a 311M full-size model scoring 65.2. Both support 200+ languages with enhanced retrieval for 52 languages, handle 32K-token context (64x increase over predecessors), and include code retrieval across 9 programming languages.

model release

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens

Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

Comments

Loading...