nvidia

44 articles tagged with nvidia

June 18, 2026
model releaseMistral AI

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

June 10, 2026
model release

Google releases DiffusionGemma 26B, open-weight model generates 500+ tokens/second

Google has released DiffusionGemma 26B, an open-weight text generation model under Apache 2 license. The model generates over 500 tokens/second according to testing on NVIDIA's free NIM API, where it produced 2,409 tokens in 4.4 seconds.

June 5, 2026
model releaseNVIDIA

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

model releaseNVIDIA

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

June 4, 2026
model releaseNVIDIA

Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context

Nvidia has released Nemotron 3.5 Content Safety, a 4-billion parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. The model is available for free, supports 128K token context windows, and moderates content across 12 languages.

model releaseNVIDIA

Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window

Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.

researchNVIDIA

NVIDIA Shows Task-Seeded Synthetic Data Boosts Nemotron-3 Nano by +11.1 on GPQA

NVIDIA demonstrated that task-seeded synthetic Q&A data improves model performance across multiple benchmarks in a 100B-token continuation experiment on Nemotron-3 Nano. The approach improved GPQA scores by +11.1 points, MMLU-Pro by +1.8, average code by +1.9, and commonsense understanding by +1.6.

June 2, 2026
analysis

Nvidia Releases Cosmos 3 Video Generation Models in Three Sizes: Nano, Super, and Super-Image2Video

Nvidia has released three variants of its Cosmos 3 video generation model family on Hugging Face: Cosmos3-Nano, Cosmos3-Super, and Cosmos3-Super-Image2Video. The release includes models for both standard video generation and specialized image-to-video conversion, though detailed specifications including parameter counts and benchmark scores have not yet been disclosed.

model releaseNVIDIA

NVIDIA Releases Cosmos 3: 64B-Parameter Omnimodal World Model for Physical AI

NVIDIA released Cosmos 3, an omnimodal world foundation model platform for Physical AI spanning robotics, autonomous driving, and industrial environments. The flagship Cosmos3-Super variant contains 64 billion parameters and generates video, images, audio, and action commands from text, image, video, and action trajectory inputs using a Mixture-of-Transformers architecture.

model releaseNVIDIA

NVIDIA Releases Cosmos3-Super: 64B-Parameter Omnimodal World Model for Physical AI

NVIDIA released Cosmos3-Super, a 64-billion parameter omnimodal foundation model that generates video, images, audio, and action commands from combinations of text, image, video, and action trajectory inputs. The model, part of the Cosmos3 collection, targets Physical AI applications including robotics, autonomous vehicles, and industrial automation.

model releaseNVIDIA

NVIDIA Releases Cosmos3-Nano: 16B-Parameter Omnimodal World Model for Physical AI with 256K Token Context

NVIDIA has released Cosmos3-Nano, a 16-billion parameter omnimodal world model capable of generating video, audio, images, and robot action commands from combinations of text, image, video, and action trajectory inputs. The model supports a 256K token context window and is designed for Physical AI applications including robotics, autonomous vehicles, and smart manufacturing environments.

June 1, 2026
model releaseNVIDIA+1

NVIDIA Releases Cosmos 3: 8B and 32B Omni-Models Combining Video Generation, Reasoning, and Action in Single Architectur

NVIDIA has released Cosmos 3, a unified omni-model that combines world generation, physical reasoning, and action generation in a single architecture. Available in 8B (Nano) and 32B (Super) parameter versions on Hugging Face, Cosmos 3 uses a Mixture-of-Transformers architecture to process text, image, video, audio, and action modalities without switching between separate models.

May 28, 2026
model releaseMistral AI

Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0

Mistral AI has released Mistral Small 4, a 119B total parameter mixture-of-experts model with 256K context window and native multimodal capabilities. The model uses 128 experts with 4 active per token (6B active parameters) and is released under the Apache 2.0 license, marking Mistral's first unified model combining reasoning, multimodal, and coding capabilities.

model releaseMistral AI

Mistral Releases Mistral Large 3 with 675B Parameters and Three Ministral 3 Models Under Apache 2.0

Mistral AI has released Mistral 3, consisting of Mistral Large 3—a sparse mixture-of-experts model with 675B total parameters and 41B active parameters—and three Ministral 3 models at 3B, 8B, and 14B parameters. All models are released under the Apache 2.0 license with multimodal capabilities including image understanding.

product updateMistral AI

Mistral AI Launches Compute Infrastructure Service with Tens of Thousands of NVIDIA GPUs

Mistral AI has launched Mistral Compute, an AI infrastructure service offering private, integrated stacks including GPUs, orchestration, and APIs. The service will provide access to tens of thousands of NVIDIA GPUs, targeting European, Middle Eastern, and Asian customers seeking alternatives to US or China-based cloud providers.

model releaseNVIDIA

NVIDIA releases LocateAnything-3B vision-language model with 2.5× faster object detection via parallel box decoding

NVIDIA released LocateAnything-3B, a 3-billion parameter vision-language model that predicts bounding boxes in parallel rather than token-by-token, achieving up to 2.5× higher throughput compared to autoregressive approaches. The model, trained on 12M images with 138M+ queries and 785M bounding boxes, supports object detection, GUI element grounding, and robotics perception.

May 23, 2026
researchNVIDIA

NVIDIA Releases Nemotron-Labs Diffusion Models With 6.4× Faster Token Generation Than Autoregressive Decoding

NVIDIA has released Nemotron-Labs Diffusion, a family of diffusion language models at 3B, 8B, and 14B scales that generate multiple tokens in parallel rather than one at a time. The 8B model achieves 6.4× higher tokens per forward pass than autoregressive models in self-speculation mode while maintaining comparable accuracy.

May 18, 2026
researchNVIDIA

NVIDIA releases LoRA/DoRA fine-tuning guide for Cosmos Predict 2.5 to generate synthetic robot training data

NVIDIA published a technical guide for parameter-efficient fine-tuning of its Cosmos Predict 2.5 world model using LoRA and DoRA adapters. The method allows teams to adapt the 2B-parameter model to robot manipulation tasks on a single 80GB GPU, generating synthetic training trajectories from just 92 demonstration videos.

May 6, 2026
changelogAnthropic

Anthropic doubles Claude Code rate limits, secures 220,000 Nvidia GPUs via SpaceX Colossus 1 deal

Anthropic doubled Claude Code's five-hour rate limits across Pro, Max, Team, and Enterprise plans effective Tuesday, removing peak-hours throttling for Pro and Max users. The capacity expansion comes from an exclusive agreement securing all compute at SpaceX's Colossus 1 data center, which provides over 300 megawatts and more than 220,000 Nvidia GPUs.

April 28, 2026
model releaseNVIDIA

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

April 4, 2026
analysis

Tencent releases HY-OmniWeaving multimodal model as Gemma-4 variants emerge

Tencent has released HY-OmniWeaving, a new multimodal model available on Hugging Face. Concurrently, NVIDIA and Unsloth have published optimized variants of Gemma-4, including a 31B instruction-tuned version and quantized GGUF format.

model releaseGoogle DeepMind

NVIDIA releases Gemma 4 31B quantized model with 256K context, multimodal capabilities

NVIDIA has released a quantized version of Google DeepMind's Gemma 4 31B IT model, compressed to NVFP4 format for efficient inference on consumer GPUs. The 30.7B-parameter multimodal model supports 256K token context windows, handles text and image inputs with video frame processing, and maintains near-baseline performance across reasoning and coding benchmarks.

April 2, 2026
model releaseNVIDIA

NVIDIA Optimizes Google Gemma 4 for Local Agentic AI on RTX and Spark

NVIDIA has optimized Google's Gemma 4 models for local deployment on RTX and Spark platforms, targeting the emerging wave of on-device agentic AI. The optimization enables small, efficient models to access real-time local context for autonomous decision-making without cloud dependency.

benchmarkNVIDIA

Nvidia claims 291 MLPerf wins with 288-GPU setup; AMD MI355X crosses 1M tokens/sec

MLCommons published MLPerf Inference v6.0 results on April 1, 2026, with Nvidia, AMD, and Intel each claiming top spots in different configurations. Nvidia's 288-GPU GB300-NVL72 system achieved 2.49 million tokens per second on DeepSeek-R1, while AMD's MI355X crossed one million tokens per second for the first time. Direct comparisons remain difficult as each chipmaker targets different market segments and benchmarks.

March 28, 2026
model releaseNVIDIA

NVIDIA releases gpt-oss-puzzle-88B, 88B-parameter reasoning model with 1.63× throughput gains

NVIDIA released gpt-oss-puzzle-88B on March 26, 2026, a 88-billion parameter mixture-of-experts model optimized for inference efficiency on H100 hardware. Built using the Puzzle post-training neural architecture search framework, the model achieves 1.63× throughput improvement in long-context (64K/64K) scenarios and up to 2.82× improvement on single H100 GPUs compared to its parent gpt-oss-120B, while matching or exceeding accuracy across reasoning effort levels.

March 24, 2026
model releaseStability AI

Stability AI and NVIDIA launch Stable Diffusion 3.5 NIM for faster image generation

Stability AI and NVIDIA have launched Stable Diffusion 3.5 NIM, a microservice designed to accelerate image generation performance and simplify enterprise deployment. The collaboration packages Stable Diffusion 3.5 as an NVIDIA NIM (NVIDIA Inference Microservice) for optimized inference.

changelogStability AI

Stable Diffusion 3.5 TensorRT optimization delivers 2x faster generation, 40% less VRAM on RTX GPUs

Stability AI has released TensorRT-optimized versions of the Stable Diffusion 3.5 model family in collaboration with NVIDIA. The optimization uses FP8 quantization to achieve 2x faster generation speed and 40% lower VRAM requirements on supported RTX GPUs.

March 23, 2026
model releaseNVIDIA+1

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

product updateNVIDIA

NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window

NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.

model releaseNVIDIA

NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation

NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.

March 14, 2026
fundingNVIDIA

Nvidia to spend $26B on open-weight AI models, filing reveals

Nvidia will invest $26 billion over the next five years to build open-weight AI models, according to a 2025 financial filing confirmed by executives. The move signals a strategic shift from chipmaker to AI frontier lab, with the company releasing Nemotron 3 Super (128B parameters) and claiming it outperforms GPT-OSS on multiple benchmarks.

March 12, 2026
product updateNVIDIA

Nvidia to spend $26B on open-weight AI models, targeting Chinese competition and developer lock-in

An SEC filing reveals Nvidia plans to spend $26 billion on open-weight AI models over the next five years. The investment targets the open-source gap left by OpenAI, Meta, and Anthropic while countering the rise of Chinese open-source models and deepening developer dependence on Nvidia hardware.

product update

Meta unveils four custom AI inference chips to cut costs and reduce Nvidia dependency

Meta has unveiled four generations of custom-designed AI chips focused on inference workloads, aiming to reduce inference costs across its platforms serving billions of users. The move represents a significant step toward reducing Meta's dependence on GPU manufacturers like Nvidia and AMD.

model releaseNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.

March 11, 2026
model releaseNVIDIA

NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture

NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model designed for text generation and conversational tasks. The model employs a latent mixture-of-experts (MoE) architecture and supports multiple languages including English, French, Spanish, Italian, German, Japanese, and Chinese.

March 10, 2026
product updateNVIDIA

Nvidia partners with Mira Murati's Thinking Machines Lab in long-term deal

Nvidia and Thinking Machines Lab, founded by former OpenAI executive Mira Murati, have announced a long-term partnership. Details on the scope and terms of the collaboration remain limited.

funding

Thinking Machines Lab secures Nvidia compute deal with 1+ gigawatt power allocation

Thinking Machines Lab has secured a multi-year compute deal with Nvidia involving at least 1 gigawatt of processing power, according to the company. The agreement also includes a strategic investment from Nvidia, marking a significant infrastructure commitment for the AI research organization.

March 9, 2026
product updateNVIDIA

Nvidia planning open-source AI agent platform ahead of developer conference

Nvidia is preparing to launch an open-source AI agent platform, according to reports ahead of the company's annual developer conference. The move mirrors approaches by competitors like OpenAI in building agent-based AI systems.

product updateNVIDIA

NVIDIA Nemotron 3 Nano now available on Amazon Bedrock as serverless model

Amazon Bedrock now offers NVIDIA's Nemotron 3 Nano as a fully managed serverless model, expanding its Nemotron portfolio alongside previously available Nemotron 2 Nano 9B and Nemotron 2 Nano VL 12B variants. The addition enables developers to deploy NVIDIA's smallest inference-optimized model without managing infrastructure.

funding

Nvidia-backed Nscale raises $2B, hits $14.6B valuation with Sandberg and Clegg joining board

Nvidia-backed British AI infrastructure startup Nscale has raised $2 billion in a new funding round, bringing its valuation to $14.6 billion. The round marks a significant milestone for the infrastructure-focused startup, with Meta's former COO Sheryl Sandberg and Meta's former VP of Global Affairs Nick Clegg joining the board.

March 2, 2026
product updateNVIDIA

Nvidia invests $4 billion in photonics companies Lumentum and Coherent

Nvidia announced Monday it is investing $2 billion each into photonics companies Lumentum and Coherent to develop optical transceivers, circuit switches, and lasers for next-generation AI data centers. The technology aims to improve energy efficiency, data transfer speeds, and bandwidth in data center infrastructure.

February 27, 2026
product update

Meta signs multi-billion dollar TPU rental deal with Google, challenging Nvidia's chip dominance

Meta has signed a multi-billion dollar deal to rent Google's TPU (Tensor Processing Unit) chips for training its AI models, marking a significant shift away from Nvidia's dominance in AI infrastructure. The arrangement provides Meta with alternative compute capacity while signaling growing competition in the specialized AI chip market.

fundingOpenAI

OpenAI closes $110B funding round from Amazon, Nvidia, SoftBank at $730B valuation

OpenAI has closed a $110 billion funding round with Amazon committing $50 billion, Nvidia $30 billion, and SoftBank $30 billion. The company is now valued at $730 billion, following a previous $40 billion round in 2025. The funding includes custom model development agreements between OpenAI and Amazon Web Services.

February 20, 2026
fundingNVIDIA

Nvidia reportedly planning $30 billion investment in OpenAI

Nvidia is reportedly planning a $30 billion investment in OpenAI, according to Reuters citing sources familiar with the matter. The deal would represent one of the largest funding commitments in the AI sector to date. Terms and timeline have not been officially confirmed by either company.