LLM News

Every LLM release, update, and milestone.

benchmarkxAI

Grok 4.20 trails GPT-5.4 and Gemini 3.1 but achieves record 78% non-hallucination rate

xAI's Grok 4.20 scores 48 on Artificial Analysis' Intelligence Index—6 points ahead of Grok 4 but trailing Gemini 3.1 Pro Preview and GPT-5.4 at 57. The model distinguishes itself with a 78% non-hallucination rate on the AA Omniscience test, the highest recorded across any model tested.

March 14, 2026 · 6:38 PM2 min read

Grok xAI benchmark

via the-decoder.com ↗

product updateOpenAI

OpenAI Python SDK v2.27.0 adds Sora video API improvements and character support

OpenAI released version 2.27.0 of its Python SDK on March 13, 2026, adding significant improvements to the Sora video generation API. The update introduces character API support, video extension and editing capabilities, and higher resolution export options.

March 14, 2026 · 6:22 PM1 min read

openai python-sdk sora

via github.com ↗

model release

AI2 releases robotics models trained entirely in simulation, achieving zero-shot real-world transfer

AI2 has released MolmoSpaces and MolmoBot, robotics models trained exclusively in simulation that transfer directly to real robots without manual real-world data collection or fine-tuning. The approach eliminates months of teleoperated demonstrations typically required for simulation-trained robots. Both systems are open-source.

March 14, 2026 · 6:20 PM2 min read

robotics simulation sim-to-real-transfer

via the-decoder.com ↗

model release

Hume AI open-sources TADA: speech model 5x faster than rivals with zero hallucination

Hume AI has open-sourced TADA, a speech generation model that maps exactly one audio signal to each text token, achieving 5x faster processing than comparable systems. The model produced zero transcription hallucinations across 1,000+ test samples and runs on smartphones, available in 1B and 3B parameter versions under MIT license.

March 14, 2026 · 6:20 PM1 min read

speech-generation open-source audio-synthesis

via the-decoder.com ↗

product updatePerplexity AI

Perplexity launches Personal Computer AI agent at $200/month for autonomous task handling

Perplexity AI has launched Personal Computer, a paid AI agent service priced at $200 per month that operates autonomously to handle emails, presentations, and application control. The service aims to provide continuous AI assistance for routine digital tasks without human intervention.

March 13, 2026 · 3:11 PM2 min read

perplexity-ai ai-agents autonomous-agents

via the-decoder.com ↗

product updateOpenAI

OpenAI acquires Promptfoo, an AI security and testing platform

OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development. Terms of the acquisition were not disclosed.

March 12, 2026 · 3:51 PM2 min read

openai acquisition ai-security

via openai.com ↗

product updateNVIDIA

Nvidia to spend $26B on open-weight AI models, targeting Chinese competition and developer lock-in

An SEC filing reveals Nvidia plans to spend $26 billion on open-weight AI models over the next five years. The investment targets the open-source gap left by OpenAI, Meta, and Anthropic while countering the rise of Chinese open-source models and deepening developer dependence on Nvidia hardware.

March 12, 2026 · 3:05 PM2 min read

nvidia open-source-ai open-weight-models

via the-decoder.com ↗

product update

Perplexity launches Personal Computer to run AI agent on your Mac 24/7

Perplexity launched Personal Computer, an AI agent tool that runs continuously on a local Mac with full access to your files and applications. The system operates as a "digital proxy" controlled remotely from any device, expanding on Perplexity's earlier Computer product announced last month.

March 12, 2026 · 12:20 PM2 min read

perplexity-ai ai-agents macos

via theverge.com ↗

product update

Meta unveils four custom AI inference chips to cut costs and reduce Nvidia dependency

Meta has unveiled four generations of custom-designed AI chips focused on inference workloads, aiming to reduce inference costs across its platforms serving billions of users. The move represents a significant step toward reducing Meta's dependence on GPU manufacturers like Nvidia and AMD.

March 12, 2026 · 12:05 PM2 min read

meta custom-silicon inference

via the-decoder.com ↗

model releaseNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.

March 12, 2026 · 11:35 AM2 min read

nvidia model-release 120b-parameters

via huggingface.co ↗

model releaseNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model designed for text generation and conversational tasks. The model employs a latent mixture-of-experts (MoE) architecture and supports multiple languages including English, French, Spanish, Italian, German, Japanese, and Chinese.

March 11, 2026 · 11:50 PM1 min read

nvidia model-release text-generation

via huggingface.co ↗

model release

Google's Gemini Embedding 2 unifies text, image, video, and audio in single vector space

Google has released Gemini Embedding 2, its first native multimodal embedding model that represents text, images, video, audio, and documents in a unified vector space. The model eliminates the need for separate embedding models across different modalities in AI pipelines.

March 11, 2026 · 6:50 PM2 min read

embeddings multimodal google

via the-decoder.com ↗

research

Half of AI code passing SWE-bench would be rejected by real developers, METR study finds

A study by research organization METR found that approximately 50% of AI-generated code solutions that pass the widely-used SWE-bench benchmark would be rejected by actual project maintainers. The finding exposes a significant gap between industry-standard code generation benchmarks and real-world code review standards.

March 11, 2026 · 6:05 PM2 min read

ai-code-generation benchmarking swe-bench

via the-decoder.com ↗

product updateOpenAI

OpenAI plans to integrate Sora video generator directly into ChatGPT

OpenAI plans to integrate its Sora video generator as a built-in feature within ChatGPT, according to The Information. Currently available only on a standalone website and app, the integration would let users generate videos directly in the chatbot, similar to how image generation was added last year.

March 11, 2026 · 5:05 PM1 min read

openai chatgpt sora

via theverge.com ↗

research

AI2 uses virtual simulation data to train physical AI robots, reducing real-world data costs

AI2 is developing physical AI systems trained primarily on virtual simulation data rather than expensive real-world demonstrations. The approach, demonstrated through projects like MolmoBot, addresses the historical bottleneck of manually collecting hardware training data.

March 11, 2026 · 5:05 PM2 min read

physical-ai robotics simulation

via artificialintelligence-news.com ↗

researchOpenAI

OpenAI releases IH-Challenge dataset to train models to reject untrusted instructions

OpenAI has released IH-Challenge, a training dataset designed to teach AI models to reliably distinguish between trusted and untrusted instructions. Early results show significant improvements in security and prompt injection defense capabilities.

March 11, 2026 · 3:50 PM2 min read

openai ai-safety prompt-injection

via the-decoder.com ↗

research

AI agent compromised McKinsey's internal platform in 2 hours using SQL injection

An AI agent deployed by security firm Codewall gained full read and write access to McKinsey's internal AI platform Lilli within two hours without credentials or insider knowledge. The exploit used SQL injection, a decades-old vulnerability technique, to compromise a system serving over 43,000 employees for strategy work and client research.

March 11, 2026 · 3:35 PM2 min read

security ai-agents vulnerability

via the-decoder.com ↗

product update

Meta develops four custom AI chips to reduce Nvidia dependence

Meta has developed four new custom AI chips called MTIA (Meta Training and Inference Accelerator) processors designed to power its AI models and recommendation systems. The move represents the company's ongoing effort to reduce dependence on Nvidia's expensive processors while managing massive compute requirements.

March 11, 2026 · 2:35 PM2 min read

meta custom-silicon ai-chips

via wired.com ↗

model release

Hume AI releases TADA-1B, a 1 billion parameter text-to-speech model

Hume AI has released TADA-1B, a 1 billion parameter text-to-speech model available on Hugging Face under an MIT license. The model, which combines speech and language capabilities, has already accumulated over 3,100 downloads since its January 12 release.

March 11, 2026 · 12:20 PM1 min read

text-to-speech tts speech-synthesis

via huggingface.co ↗

product update

Qualcomm and Wayve partner to integrate physical AI into production vehicles

Qualcomm and Wayve announced a technical partnership to integrate Wayve's AI driving layer with Qualcomm's hardware platform for production-ready advanced driver assistance systems. The collaboration aims to accelerate autonomous vehicle innovation by combining hardware and software expertise.

March 11, 2026 · 10:05 AM2 min read

autonomous-vehicles physical-ai qualcomm

via artificialintelligence-news.com ↗

← PreviousPage 16 of 23Next →