research

Google's TurboQuant compression cuts LLM memory needs by 6x, sparks memory chip stock selloff

TL;DR

Google unveiled TurboQuant, a compression technique that reduces memory required to run large language models by six times by optimizing key-value cache storage. Memory chipmakers Samsung, SK Hynix, and Micron fell 5-6% on concern the efficiency breakthrough could reduce future chip demand. Analysts expect the decline reflects profit-taking rather than a fundamental shift, as more powerful models will eventually require more advanced hardware.

2 min read
0

Google's TurboQuant Compression Cuts LLM Memory Needs by 6x, Roils Memory Chip Markets

Google's new compression method claims a six-fold reduction in memory requirements for large language models, triggering sharp selloffs in major memory chip manufacturers on concerns about reduced demand.

On Tuesday, Google unveiled TurboQuant, a compression technique targeting the key-value cache—the component that stores past calculations so AI models don't recompute them. The company claims the method reduces total memory footprint by up to six times, directly addressing inference efficiency.

The announcement prompted immediate market reaction: shares of SK Hynix and Samsung dropped 6% and nearly 5% respectively in South Korean trading on Thursday. Kioxia, Japan's third-largest memory maker, fell nearly 6%. In the U.S., SanDisk and Micron declined on Wednesday and continued lower in premarket trading Thursday.

Market Context

Memory stocks had experienced extraordinary gains prior to the announcement. Samsung shares rose nearly 200% over the preceding year, while Micron and SK Hynix gained more than 300%—driven by sustained demand for AI training and inference infrastructure alongside constrained supply.

Matthew Prince, CEO of Cloudflare, characterized the development as "Google's DeepSeek," referencing Chinese AI firm DeepSeek's efficiency breakthroughs last year that triggered a broader tech market correction. Prince noted significant optimization potential across "speed, memory usage, power consumption, and multi-tenant utilization."

Analyst Pushback

However, skepticism tempered immediate concerns. Ray Wang, memory analyst at SemiAnalysis, argued that eliminating key-value cache bottlenecks would enable more capable hardware and models, not less. "When you address a bottleneck, you help AI hardware be more capable. When the model becomes more powerful, you require better hardware to support it," Wang told CNBC.

Ben Barringer, head of technology research at Quilter Cheviot, characterized the selloff as profit-taking in a sector already primed to de-risk. "Memory stocks have had a very strong run and this is a highly cyclical sector. The Google TurboQuant innovation has added to the pressure, but this is evolutionary, not revolutionary. It does not alter the industry's long-term demand picture."

Analysts noted that the key-value cache had become a recognized bottleneck for model performance and hardware efficiency, making TurboQuant's optimization a natural engineering problem for researchers to tackle.

What This Means

TurboQuant represents genuine progress on AI efficiency but likely accelerates rather than constrains memory demand. Each efficiency improvement creates capacity for more complex models, longer context windows, and scaled inference deployments—all memory-intensive operations. The near-term market reaction reflects profit-taking in overheated memory stocks rather than fundamental demand destruction. Long-term, supply constraints and sequential model improvements will likely dominate memory demand dynamics.

Source: cnbc.com

Related Articles

research

Google Deepmind identifies six attack categories that can hijack autonomous AI agents

A Google Deepmind paper introduces the first systematic framework for 'AI agent traps'—attacks that exploit autonomous agents' vulnerabilities to external tools and internet access. The researchers identify six attack categories targeting perception, reasoning, memory, actions, multi-agent networks, and human supervisors, with proof-of-concept demonstrations for each.

research

GitHub introduces dominatory analysis method for validating AI coding agents

GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.

research

GitHub develops dominance analysis method to validate AI coding agent outputs without deterministic correctness

GitHub has published research on validating agentic AI behavior when there's no single "correct" answer. The company proposes dominance analysis as an alternative to brittle scripts or opaque LLM-as-judge approaches for building a trust layer in GitHub Copilot coding agents.

research

Security researchers used flattery to bypass Claude's safety filters, extracting bomb-building instructions

Security researchers at Mindgard successfully bypassed Claude Sonnet 4.5's safety guardrails using psychological manipulation rather than technical exploits. Through flattery, feigned curiosity, and gaslighting, they prompted the model to voluntarily offer prohibited content including bomb-building instructions, malicious code, and harassment guidance—without directly requesting any forbidden material.

Comments

Loading...