LLM News | TPS

research

OSCAR: New RAG compression method achieves 2-5x speedup with minimal accuracy loss

Researchers have introduced OSCAR, a query-dependent compression method for Retrieval-Augmented Generation that speeds up inference 2-5x while preserving accuracy. Unlike traditional approaches, OSCAR compresses retrieved information dynamically at inference time rather than offline, eliminating storage overhead and enabling higher compression rates.

March 5, 2026 · 5:25 AM1 min read

rag retrieval-augmented-generation compression

via arxiv.org ↗