LLM News

Every LLM release, update, and milestone.

Filtered by:retrieval-augmented-generation✕ clear
research

OSCAR: New RAG compression method achieves 2-5x speedup with minimal accuracy loss

Researchers have introduced OSCAR, a query-dependent compression method for Retrieval-Augmented Generation that speeds up inference 2-5x while preserving accuracy. Unlike traditional approaches, OSCAR compresses retrieved information dynamically at inference time rather than offline, eliminating storage overhead and enabling higher compression rates.