LLM News

Every LLM release, update, and milestone.

Filtered by:enterprise✕ clear
research

xLLM: Open-source inference framework claims 2.2x vLLM throughput on Ascend accelerators

Researchers have released xLLM, an open-source Large Language Model inference framework designed for enterprise-scale serving. The framework claims to achieve up to 2.2x higher throughput than vLLM-Ascend when serving Qwen-series models under identical latency constraints, using a novel decoupled architecture that separates service scheduling from engine optimization.

2 min readvia arxiv.org