iclr-2026

1 article tagged with iclr-2026

March 25, 2026
research

Google's TurboQuant cuts AI inference memory by 6x using lossless compression

Google Research unveiled TurboQuant, a lossless memory compression algorithm that reduces AI inference working memory (KV cache) by at least 6x without impacting model performance. The technology uses vector quantization methods called PolarQuant and an optimization technique called QJL. Findings will be presented at ICLR 2026.