LLM News

stepfun model-release text-generation

model release

Step-3.5-Flash-Base: StepFun releases lightweight text generation model

StepFun has released Step-3.5-Flash-Base, a text generation model available on Hugging Face under Apache 2.0 license. The model is part of the Step 3.5 series and focuses on efficient inference.

March 5, 2026 · 8:50 AM1 min read

llm-research memory-retrieval reinforcement-learning

research

MemSifter uses smaller proxy models to handle LLM memory retrieval, reducing computational overhead

Researchers introduce MemSifter, a framework that offloads memory retrieval to smaller proxy models instead of burdening the primary LLM. The approach uses outcome-driven reinforcement learning to optimize retrieval accuracy while minimizing computational overhead during inference.

March 5, 2026 · 5:54 AM2 min read

speech-enhancement generative-models flow-models

research

MeanFlowSE enables single-step speech enhancement by learning mean velocity fields instead of instantaneous flows

Researchers introduced MeanFlowSE, a generative speech enhancement model that eliminates the computational bottleneck of multistep inference by learning average velocity over finite intervals rather than instantaneous velocity fields. The single-step approach achieves comparable quality to multistep baselines on VoiceBank-DEMAND while requiring substantially lower computational cost and no knowledge distillation.

March 5, 2026 · 5:08 AM1 min read

inference llm-framework open-source

research

xLLM: Open-source inference framework claims 2.2x vLLM throughput on Ascend accelerators

Researchers have released xLLM, an open-source Large Language Model inference framework designed for enterprise-scale serving. The framework claims to achieve up to 2.2x higher throughput than vLLM-Ascend when serving Qwen-series models under identical latency constraints, using a novel decoupled architecture that separates service scheduling from engine optimization.

March 5, 2026 · 12:51 AM2 min read

research llm-agents scientific-data

research

DeepXiv-SDK releases three-layer agentic interface for scientific literature access

DeepXiv-SDK introduces a three-layer agentic data interface designed to give LLM agents efficient, cost-aware access to scientific literature. The system transforms unstructured data into normalized JSON, offers retrieval tools via CLI, MCP, and Python SDK, and currently covers the complete arXiv corpus with daily synchronization.

March 5, 2026 · 12:50 AM2 min read

model release

Alibaba releases Qwen3.5-2B, a 2B-parameter multimodal model for image and text tasks

Alibaba has released Qwen3.5-2B, a 2-billion-parameter multimodal model capable of processing both images and text. The model is available on Hugging Face under the Apache 2.0 license and supports image-text-to-text tasks.

March 2, 2026 · 8:05 PM2 min read

model release

Alibaba releases Qwen3.5-4B, a 4B multimodal model for vision and text tasks

Alibaba's Qwen team has released Qwen3.5-4B, a 4 billion parameter multimodal model capable of processing both images and text. The model is available on Hugging Face under an Apache 2.0 license, making it freely available for commercial and research use.

March 2, 2026 · 3:20 PM2 min read

model release

Alibaba releases Qwen3.5-9B, a multimodal 9B parameter model

Alibaba has released Qwen3.5-9B, a 9-billion parameter multimodal language model capable of processing both images and text. The model is available under Apache 2.0 license on Hugging Face with transformer-compatible architecture.

March 2, 2026 · 1:50 PM2 min read

embedding-models open-source perplexity-ai

model release

Perplexity open-sources embedding models matching Google and Alibaba with lower memory requirements

Perplexity has open-sourced two text embedding models designed to match or exceed the performance of Google's and Alibaba's embeddings while requiring significantly less memory. The move brings competitive embedding technology into the open-source ecosystem.

February 28, 2026 · 10:50 AM2 min read

via the-decoder.com ↗

model releaseDeepSeek

DeepSeek releases R1 reasoning model with chain-of-thought capabilities

DeepSeek has released DeepSeek-R1, a text generation model featuring reasoning capabilities through chain-of-thought processing. The model was published January 20, 2025 and has accumulated over 830,000 downloads on Hugging Face.

February 27, 2026 · 11:05 AM2 min read

deepseek model-release reasoning

model release

Alibaba releases Qwen3.5-35B-A3B, a 35B multimodal model with Apache 2.0 license

Alibaba's Qwen team has released Qwen3.5-35B-A3B-Base, a 35-billion parameter multimodal model supporting image-text-to-text tasks. The model is available under the Apache 2.0 license and compatible with major inference endpoints including Azure deployment.

February 26, 2026 · 2:05 PM1 min read

qwen alibaba-qwen multimodal

model release

Alibaba releases Qwen3.5-27B, a 27B multimodal model with Apache 2.0 license

Alibaba Qwen has released Qwen3.5-27B, a 27-billion parameter model capable of processing both images and text. The model is available under an Apache 2.0 open license and is compatible with standard transformer endpoints.

February 24, 2026 · 7:20 PM2 min read

model-release qwen agent-models

model release

LocoreMind releases LocoOperator-4B, a 4B parameter agent model based on Qwen3

LocoreMind has released LocoOperator-4B, a 4 billion parameter text generation model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. The model is optimized for agent workflows and tool-calling capabilities and is available under an MIT license.

February 24, 2026 · 9:05 AM1 min read

interpretability open-source language-models

model release

Guide Labs open-sources Steerling-8B, an interpretable 8B parameter LLM

Guide Labs has open-sourced Steerling-8B, an 8 billion parameter language model built with a new architecture specifically designed to make the model's reasoning and actions easily interpretable. The release addresses a persistent challenge in AI development: understanding how large language models arrive at their outputs.

February 23, 2026 · 6:05 PM2 min read

via techcrunch.com ↗

model releaseCohere

Cohere releases tiny-aya-global, multilingual text model covering 100+ languages

Cohere Labs has released tiny-aya-global, a lightweight text generation model trained to support conversational tasks across 100+ languages. The model is available on Hugging Face under a CC-BY-NC-4.0 license and builds on the tiny-aya-base architecture.

February 22, 2026 · 12:20 PM1 min read

cohere multilingual text-generation

zyphra open-source model-release

model release

Zyphra releases ZUNA model under Apache 2.0 license

Zyphra has released ZUNA, an open-source model available on Hugging Face under the Apache 2.0 license. The model has been downloaded 529 times since its February 4, 2026 release.

February 20, 2026 · 3:50 PM1 min read

diffusion-models mixture-of-experts image-generation

model release

Segmind releases SegMoE, a mixture-of-experts diffusion model for faster image generation

Segmind has released SegMoE, a mixture-of-experts (MoE) diffusion model designed to accelerate image generation while reducing computational overhead. The model applies MoE techniques traditionally used in large language models to the diffusion model architecture, enabling selective expert activation during inference.

February 20, 2026 · 3:06 AM2 min read