ai-efficiency

1 article tagged with ai-efficiency

March 25, 2026

research

Google's TurboQuant cuts AI inference memory by 6x using lossless compression

Google Research unveiled TurboQuant, a lossless memory compression algorithm that reduces AI inference working memory (KV cache) by at least 6x without impacting model performance. The technology uses vector quantization methods called PolarQuant and an optimization technique called QJL. Findings will be presented at ICLR 2026.

March 25, 2026 · 8:50 PM

← Back to all news