LLM News | TPS

research

xLLM: Open-source inference framework claims 2.2x vLLM throughput on Ascend accelerators

Researchers have released xLLM, an open-source Large Language Model inference framework designed for enterprise-scale serving. The framework claims to achieve up to 2.2x higher throughput than vLLM-Ascend when serving Qwen-series models under identical latency constraints, using a novel decoupled architecture that separates service scheduling from engine optimization.

March 5, 2026 · 12:51 AM2 min read

inference llm-framework open-source

via arxiv.org ↗