efficiency
3 articles tagged with efficiency
DeepSeek Releases V4-Flash: 284B-Parameter MoE Model With 1M Token Context at 27% Inference Cost
DeepSeek released two Mixture-of-Experts models: V4-Flash with 284B total parameters (13B activated) and V4-Pro with 1.6T parameters (49B activated). Both models support one million token context windows and use a hybrid attention architecture that requires only 27% of the inference FLOPs compared to DeepSeek-V3.2 at 1M token context.
Mistral AI releases Mistral Small 4, claims improved performance on reasoning tasks
Mistral AI has released Mistral Small 4, the latest iteration of its small-scale language model. The company claims improvements in reasoning and coding capabilities, though specific benchmark scores and pricing details have not been publicly disclosed.
Google releases Gemini 3.1 Flash-Lite, fastest model in 3 series
Google DeepMind has released Gemini 3.1 Flash-Lite, positioning it as the fastest and most cost-efficient model in the Gemini 3 series. The release targets applications requiring high-speed inference at scale, continuing Google's multi-tier model strategy across the Gemini family.