efficiency

3 articles tagged with efficiency

April 24, 2026
model releaseDeepSeek

DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost

DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.

March 17, 2026
model release

Mistral AI releases Mistral Small 4, claims improved performance on reasoning tasks

Mistral AI has released Mistral Small 4, the latest iteration of its small-scale language model. The company claims improvements in reasoning and coding capabilities, though specific benchmark scores and pricing details have not been publicly disclosed.

March 3, 2026
model release

Google releases Gemini 3.1 Flash-Lite, fastest model in 3 series

Google DeepMind has released Gemini 3.1 Flash-Lite, positioning it as the fastest and most cost-efficient model in the Gemini 3 series. The release targets applications requiring high-speed inference at scale, continuing Google's multi-tier model strategy across the Gemini family.