model release

InclusionAI releases Ling-2.6-flash: 104B parameter model with 7.4B active parameters, free on OpenRouter

TL;DR

InclusionAI has released Ling-2.6-flash, an instruction-tuned model with 104 billion total parameters and 7.4 billion active parameters, available free through OpenRouter. The model features a 262,144-token context window and is designed for agent workflows requiring fast responses and high token efficiency.

2 min read
0

InclusionAI releases Ling-2.6-flash: 104B parameter model with 7.4B active parameters, free on OpenRouter

InclusionAI has released Ling-2.6-flash, an instruction-tuned model with 104 billion total parameters and 7.4 billion active parameters. The model is available at no cost through OpenRouter as of April 21, 2026.

Model Specifications

Ling-2.6-flash features a 262,144-token context window (approximately 262K tokens) and is offered with $0 per million tokens for both input and output. The model uses a sparse architecture, activating only 7.4B of its 104B total parameters during inference.

According to inclusionAI, the model is designed for "real-world agents that require fast responses, strong execution, and high token efficiency." The company claims it delivers performance comparable to state-of-the-art models at similar scale while reducing token usage across coding, document processing, and lightweight agent workflows.

Technical Architecture

The model's sparse activation approach—using only 7.1% of its total parameters per inference—enables faster response times compared to fully-activated models of similar total parameter count. This design pattern follows recent trends in mixture-of-experts and sparse architectures.

The model is accessible through OpenRouter's unified API, which provides OpenAI-compatible endpoints. OpenRouter routes requests to available providers with automatic fallbacks for uptime optimization.

Availability

Ling-2.6-flash is currently available exclusively through OpenRouter's platform. The company has not disclosed whether the model will be released through other providers or made available for self-hosting. No benchmark scores have been published at this time.

InclusionAI is not among the previously established AI model providers tracked in industry databases, suggesting this is either a new entrant or an independent research team making their first public model release.

What This Means

The release of a free, high-parameter-count model with sparse activation represents competitive pressure on existing model providers. If performance claims are verified through independent benchmarks, the 262K context window at zero cost could make this attractive for agent applications and document processing tasks. However, without published benchmark scores or information about training data and capabilities, adoption will likely depend on real-world testing by developers. The sparse activation design (7.4B active from 104B total) suggests this is optimized for cost-efficient inference rather than maximum capability.

Related Articles

model release

Alibaba's Qwen Releases Qwen3.7 Plus: 1M Context Window at $0.40 Per Million Input Tokens

Alibaba's Qwen has released Qwen3.7 Plus, a multimodal model with a 1 million token context window. The model accepts text and image input with text output, priced at $0.40 per million input tokens and $1.60 per million output tokens through OpenRouter's API.

model release

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

model release

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

model release

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

Comments

Loading...