LLM News

Every LLM release, update, and milestone.

1
model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-4B, a 4B parameter model for edge AI with 262K context window

NVIDIA released Nemotron-3-Nano-4B-GGUF on March 16, 2026, a 4-billion parameter small language model (SLM) designed for edge deployment on devices like Jetson Thor and GeForce RTX. The model features a hybrid Mamba-2 and Transformer architecture with a 262K token context window and supports both reasoning and non-reasoning modes via system prompts.

1
model releaseNVIDIA

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

2 min readvia openrouter.ai
0
product update

Multiverse Computing launches API portal for compressed AI models to reduce cloud dependence

Multiverse Computing, a Spanish startup, has launched a self-serve API portal giving developers direct access to compressed versions of models from OpenAI, Meta, DeepSeek, and Mistral AI. The move targets enterprises seeking to reduce cloud infrastructure dependence and lower compute costs through edge deployment. The company claims its HyperNova 60B 2602 model delivers faster responses at lower cost than the original OpenAI model it was derived from.

0
product update

Adobe Firefly now learns custom visual styles from user-uploaded images

Adobe is rolling out custom models for Firefly, allowing creators to train the generative model on 10-30 of their own images to generate new content matching their specific visual style. The feature costs 500 credits per training session and supports three methods: photography style, illustration style, and character consistency.

0
product updateNVIDIA

NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window

NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.

0
product update

Google AI Studio adds real-time multiplayer game coding with Gemini 3.1 Pro

Google has launched a vibe coding feature in Google AI Studio that converts natural language descriptions into working applications using Gemini 3.1 Pro. The platform now supports real-time multiplayer games and automatically configures databases, authentication, and third-party service integrations through an "Antigravity Agent."

2 min readvia the-decoder.com
0
model releaseMicrosoft

Microsoft's superintelligence team releases MAI-Image-2, ranks third in text-to-image generation

Microsoft's superintelligence team, led by Mustafa Suleyman, has released MAI-Image-2, a text-to-image generator that currently ranks third on the Arena.ai leaderboard for text-to-image models, behind OpenAI's GPT-Image-1.5 and Google's Nano Banana 2. The model is now available for testing in the MAI Playground and will roll out to Copilot and Bing Image Creator, with API access opening to all developers through Microsoft Foundry.

0
product updateAnthropic

Anthropic adds always-on channels to Claude Code, enabling async AI agent capabilities

Anthropic has added "channels" to Claude Code, enabling Claude to respond to incoming messages, webhooks, and notifications asynchronously without user intervention. The research preview supports Telegram and Discord with custom channel support, running through MCP servers with two-way communication.

2 min readvia the-decoder.com
0
model releaseNVIDIA

NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation

NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.

0
product updateOpenAI

OpenAI acquires Astral to integrate Python's most-used dev tools into Codex platform

OpenAI has acquired Astral, the company behind widely-used Python development tools Ruff, uv, and ty, which see hundreds of millions of downloads monthly. The tools will integrate with OpenAI's Codex platform to enable agentic AI systems that can plan changes, modify codebases, run tools, and verify results. OpenAI commits to keeping the tools open source under permissive licenses post-acquisition.

2 min readvia the-decoder.com
0
product update

Google Gemini task automation now works on phones, taking 9 minutes to order dinner

Google has launched task automation for Gemini on Pixel 10 Pro and Galaxy S26 Ultra, allowing the AI to autonomously use apps for food delivery and rideshare services. The feature works but is slow—taking approximately nine minutes to complete an order—and remains limited to a small beta subset of apps. Despite performance limitations, it represents the first practical demonstration of an AI assistant actually controlling a phone outside of controlled demos.

2 min readvia theverge.com
0
product updateOpenAI

OpenAI releases prompting playbook for GPT-5.4 frontend design with specific UI/UX guidelines

OpenAI has published a prompting playbook for frontend designers using GPT-5.4 to generate websites and UI designs. The guide establishes hard rules for composition, branding, typography, and layout to prevent generic outputs. Key recommendations include one-composition viewports, expressive typography, full-bleed heroes, and lower reasoning levels for faster, more focused results.

3 min readvia the-decoder.com
0
model releaseXiaomi

Xiaomi launches MiMo-V2-Pro with 1T parameters, matches Claude Opus on coding at 80% lower cost

Xiaomi shipped three AI models simultaneously designed to form a complete agent platform. MiMo-V2-Pro, a 1-trillion-parameter Mixture-of-Experts model with 42 billion active parameters per request, scores 78% on SWE-bench Verified and 81 points on ClawEval—nearly matching Claude Opus 4.6 while costing $1 per million input tokens versus $5 for Opus.

0
product update

Cursor's Composer 2 coding model built on Moonshot AI's Kimi, not developed from scratch

Cursor acknowledged that its newly launched Composer 2 coding model was built on Moonshot AI's open-source Kimi 2.5 model, not developed independently. The admission came after user scrutiny revealed code references to Kimi. Cursor claims only 25% of compute came from the base model, with 75% from its own training and reinforcement learning.

3 min readvia techcrunch.com
0
product updateOpenAI

OpenAI consolidating ChatGPT, Codex, and Atlas into single macOS superapp

OpenAI is consolidating its fragmented macOS app ecosystem by merging ChatGPT, Codex coding platform, and Atlas browser into a single "superapp" led by Chief of Applications Fidji Simo. The unified app will feature agentic AI capabilities for autonomous task execution and team collaboration, with rollout expected over coming months starting with Codex enhancements.

2 min readvia 9to5mac.com
0
product update

Google expands Universal Commerce Protocol with cart, catalog, and loyalty features for AI agents

Google has expanded the Universal Commerce Protocol (UCP) with shopping cart, catalog, and identity features designed for AI agents. The new capabilities enable agents to add multiple items to carts, access real-time product data including prices and availability, and preserve shopper loyalty benefits across retailers.

2 min readvia the-decoder.com
0
product updateOpenAI

OpenAI consolidating ChatGPT, browser, and Codex into single desktop app

OpenAI is developing a unified desktop application that combines ChatGPT, its browser, and Codex code generator into a single product. The move is intended to streamline user experience and focus resources on one integrated platform, with emphasis on agentic AI capabilities that can autonomously handle tasks like software development and data analysis.

1 min readvia engadget.com