LLM News

Every LLM release, update, and milestone.

0
product update

Gemini now imports chats and memory from ChatGPT, Claude, and other AI apps

Google is rolling out chat and memory import functionality to Gemini, allowing users to transfer conversation history from ChatGPT, Claude, and other AI apps. The feature supports zip file uploads up to 5 GB, with users able to upload up to 5 files per day. A companion memory import tool lets users generate context summaries from other chatbots to paste into Gemini.

2 min readvia 9to5google.com
0
product update

Google expands Search Live to 200+ countries with multilingual Gemini 3.1 Flash Live

Google is expanding Search Live, its voice and camera-based AI search assistant, to more than 200 countries and territories with support for dozens of languages. The expansion is powered by Gemini 3.1 Flash Live, a new audio-focused model that Google claims offers faster response times and more natural conversations.

0
model release

Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model

Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.

0
product updateAmazon Web Services

Amazon Bedrock Guardrails now supports age-responsive, context-aware safety policies

Amazon has released a serverless architecture solution using Bedrock Guardrails that dynamically selects safety policies based on user age, role, and industry. The solution enforces five specialized guardrails—including COPPA-compliant child protection and healthcare-specific policies—at inference time to prevent prompt injection attacks and ensure context-appropriate responses.

2 min readvia aws.amazon.com
0
product updateAmazon Web Services

Amazon Polly adds bidirectional streaming API for real-time speech synthesis in conversational AI

Amazon has released a new Bidirectional Streaming API for Amazon Polly that enables simultaneous text input and audio output over a single HTTP/2 connection. The API reduces end-to-end latency by 39% compared to traditional request-response TTS by allowing text to be sent word-by-word as LLMs generate tokens, rather than waiting for complete sentences. The feature is available in Java, JavaScript, .NET, C++, Go, Kotlin, PHP, Ruby, Rust, and Swift SDKs.

2 min readvia aws.amazon.com
0
product update

Google launches Search Live globally with real-time camera and voice search

Google is expanding Search Live globally to users in more than 200 countries, enabling real-time voice and camera search through the Google app and Lens. The feature, powered by Gemini 3.1 Flash Live—a new multilingual audio and video model—allows users to point their phone camera at objects and ask questions with instant spoken responses.

0
model release

Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI

Google has released Gemini 3.1 Flash Live, its highest-quality audio model designed for natural and reliable real-time voice interactions. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with thinking enabled. It's now available to developers via the Gemini Live API, enterprises through Gemini Enterprise for Customer Experience, and consumers in Search Live and Gemini Live across 200+ countries.

2 min readvia deepmind.google
0
product updateByteDance

ByteDance rolls out Dreamina Seedance 2.0 video generation to CapCut with IP safeguards

ByteDance confirmed Thursday that Dreamina Seedance 2.0, its audio and video generation model, is rolling out in CapCut across seven initial markets. The model generates videos up to 15 seconds with realistic textures and motion, but includes safety restrictions blocking generation from real faces and unauthorized IP use.

2 min readvia techcrunch.com
0
model release

Google releases Gemini 3.1 Flash Live, its highest-quality audio model for real-time voice AI

Google has released Gemini 3.1 Flash Live, its highest-quality audio and voice model designed for real-time dialogue. The model scores 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge with reasoning enabled, with improved tonal understanding and lower latency compared to previous versions.

2 min readvia blog.google
0
product updateGitHub

GitHub will train Copilot models on user interaction data starting April 2026

GitHub will use Copilot interaction data from Free, Pro, and Pro+ plan users to train AI models starting April 24, 2026, unless users actively opt out. The policy does not affect Copilot Business and Enterprise customers. Data shared will include prompts, outputs, code snippets, filenames, and repository structures.

2 min readvia the-decoder.com
0
research

Google's TurboQuant compression cuts LLM memory needs by 6x, sparks memory chip stock selloff

Google unveiled TurboQuant, a compression technique that reduces memory required to run large language models by six times by optimizing key-value cache storage. Memory chipmakers Samsung, SK Hynix, and Micron fell 5-6% on concern the efficiency breakthrough could reduce future chip demand. Analysts expect the decline reflects profit-taking rather than a fundamental shift, as more powerful models will eventually require more advanced hardware.

0
benchmarkOpenAI

ARC-AGI-3 benchmark: frontier AI models score below 1%, humans solve all 135 tasks

The ARC Prize Foundation released ARC-AGI-3, an interactive benchmark requiring AI agents to explore environments, form hypotheses, and execute plans without instructions. All 135 environments were solved by untrained humans, yet frontier models—including Gemini 3.1 Pro Preview (0.37%), GPT 5.4 (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0.00%)—scored below 1%.

0
researchApple

Apple's RubiCap model generates better image captions with 3-7B parameters than 72B competitors

Apple researchers developed RubiCap, a framework for training dense image captioning models that achieve state-of-the-art results at 2B, 3B, and 7B parameter scales. The 7B model outperforms models up to 72 billion parameters on multiple benchmarks including CapArena and CaptionQA, while the 3B variant matches larger 32B models, suggesting efficient dense captioning doesn't require massive scale.

2 min readvia 9to5mac.com
0
research

Google's TurboQuant cuts AI inference memory by 6x using lossless compression

Google Research unveiled TurboQuant, a lossless memory compression algorithm that reduces AI inference working memory (KV cache) by at least 6x without impacting model performance. The technology uses vector quantization methods called PolarQuant and an optimization technique called QJL. Findings will be presented at ICLR 2026.

0
model release

Google launches Lyria 3 Pro music generator, claims training data is rights-cleared

Google has released Lyria 3 Pro, its latest AI music generation model capable of creating tracks up to three minutes long with improved understanding of musical structure. The model is available through Gemini, Google Vids, Vertex AI, and Google AI Studio. Google claims the training data comes from sources it has contractual and legal rights to use.

2 min readvia the-decoder.com