nvidia
26 articles tagged with nvidia
Anthropic doubles Claude Code rate limits, secures 220,000 Nvidia GPUs via SpaceX Colossus 1 deal
Anthropic doubled Claude Code's five-hour rate limits across Pro, Max, Team, and Enterprise plans effective Tuesday, removing peak-hours throttling for Pro and Max users. The capacity expansion comes from an exclusive agreement securing all compute at SpaceX's Colossus 1 data center, which provides over 300 megawatts and more than 220,000 Nvidia GPUs.
Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter
Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.
Tencent releases HY-OmniWeaving multimodal model as Gemma-4 variants emerge
Tencent has released HY-OmniWeaving, a new multimodal model available on Hugging Face. Concurrently, NVIDIA and Unsloth have published optimized variants of Gemma-4, including a 31B instruction-tuned version and quantized GGUF format.
NVIDIA releases Gemma 4 31B quantized model with 256K context, multimodal capabilities
NVIDIA has released a quantized version of Google DeepMind's Gemma 4 31B IT model, compressed to NVFP4 format for efficient inference on consumer GPUs. The 30.7B-parameter multimodal model supports 256K token context windows, handles text and image inputs with video frame processing, and maintains near-baseline performance across reasoning and coding benchmarks.
NVIDIA Optimizes Google Gemma 4 for Local Agentic AI on RTX and Spark
NVIDIA has optimized Google's Gemma 4 models for local deployment on RTX and Spark platforms, targeting the emerging wave of on-device agentic AI. The optimization enables small, efficient models to access real-time local context for autonomous decision-making without cloud dependency.
Nvidia claims 291 MLPerf wins with 288-GPU setup; AMD MI355X crosses 1M tokens/sec
MLCommons published MLPerf Inference v6.0 results on April 1, 2026, with Nvidia, AMD, and Intel each claiming top spots in different configurations. Nvidia's 288-GPU GB300-NVL72 system achieved 2.49 million tokens per second on DeepSeek-R1, while AMD's MI355X crossed one million tokens per second for the first time. Direct comparisons remain difficult as each chipmaker targets different market segments and benchmarks.
NVIDIA releases gpt-oss-puzzle-88B, 88B-parameter reasoning model with 1.63× throughput gains
NVIDIA released gpt-oss-puzzle-88B on March 26, 2026, a 88-billion parameter mixture-of-experts model optimized for inference efficiency on H100 hardware. Built using the Puzzle post-training neural architecture search framework, the model achieves 1.63× throughput improvement in long-context (64K/64K) scenarios and up to 2.82× improvement on single H100 GPUs compared to its parent gpt-oss-120B, while matching or exceeding accuracy across reasoning effort levels.
Stability AI and NVIDIA launch Stable Diffusion 3.5 NIM for faster image generation
Stability AI and NVIDIA have launched Stable Diffusion 3.5 NIM, a microservice designed to accelerate image generation performance and simplify enterprise deployment. The collaboration packages Stable Diffusion 3.5 as an NVIDIA NIM (NVIDIA Inference Microservice) for optimized inference.
Stable Diffusion 3.5 TensorRT optimization delivers 2x faster generation, 40% less VRAM on RTX GPUs
Stability AI has released TensorRT-optimized versions of the Stable Diffusion 3.5 model family in collaboration with NVIDIA. The optimization uses FP8 quantization to achieve 2x faster generation speed and 40% lower VRAM requirements on supported RTX GPUs.
Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context
Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.
NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window
NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.
NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation
NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.
Nvidia to spend $26B on open-weight AI models, filing reveals
Nvidia will invest $26 billion over the next five years to build open-weight AI models, according to a 2025 financial filing confirmed by executives. The move signals a strategic shift from chipmaker to AI frontier lab, with the company releasing Nemotron 3 Super (128B parameters) and claiming it outperforms GPT-OSS on multiple benchmarks.
Nvidia to spend $26B on open-weight AI models, targeting Chinese competition and developer lock-in
An SEC filing reveals Nvidia plans to spend $26 billion on open-weight AI models over the next five years. The investment targets the open-source gap left by OpenAI, Meta, and Anthropic while countering the rise of Chinese open-source models and deepening developer dependence on Nvidia hardware.
Meta unveils four custom AI inference chips to cut costs and reduce Nvidia dependency
Meta has unveiled four generations of custom-designed AI chips focused on inference workloads, aiming to reduce inference costs across its platforms serving billions of users. The move represents a significant step toward reducing Meta's dependence on GPU manufacturers like Nvidia and AMD.
NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture
NVIDIA has released Nemotron-3-Super-120B-A12B-NVFP4, a 120-billion parameter text generation model featuring a latent Mixture-of-Experts (MoE) architecture. The model supports 8 languages including English, French, Spanish, Italian, German, Japanese, and Chinese, and is available on Hugging Face with 8-bit quantization support through NVIDIA's ModelOpt toolkit.
NVIDIA releases Nemotron-3-Super-120B, a 120B parameter model with latent MoE architecture
NVIDIA has released Nemotron-3-Super-120B-A12B-BF16, a 120 billion parameter model designed for text generation and conversational tasks. The model employs a latent mixture-of-experts (MoE) architecture and supports multiple languages including English, French, Spanish, Italian, German, Japanese, and Chinese.
Nvidia partners with Mira Murati's Thinking Machines Lab in long-term deal
Nvidia and Thinking Machines Lab, founded by former OpenAI executive Mira Murati, have announced a long-term partnership. Details on the scope and terms of the collaboration remain limited.
Thinking Machines Lab secures Nvidia compute deal with 1+ gigawatt power allocation
Thinking Machines Lab has secured a multi-year compute deal with Nvidia involving at least 1 gigawatt of processing power, according to the company. The agreement also includes a strategic investment from Nvidia, marking a significant infrastructure commitment for the AI research organization.
Nvidia planning open-source AI agent platform ahead of developer conference
Nvidia is preparing to launch an open-source AI agent platform, according to reports ahead of the company's annual developer conference. The move mirrors approaches by competitors like OpenAI in building agent-based AI systems.
NVIDIA Nemotron 3 Nano now available on Amazon Bedrock as serverless model
Amazon Bedrock now offers NVIDIA's Nemotron 3 Nano as a fully managed serverless model, expanding its Nemotron portfolio alongside previously available Nemotron 2 Nano 9B and Nemotron 2 Nano VL 12B variants. The addition enables developers to deploy NVIDIA's smallest inference-optimized model without managing infrastructure.
Nvidia-backed Nscale raises $2B, hits $14.6B valuation with Sandberg and Clegg joining board
Nvidia-backed British AI infrastructure startup Nscale has raised $2 billion in a new funding round, bringing its valuation to $14.6 billion. The round marks a significant milestone for the infrastructure-focused startup, with Meta's former COO Sheryl Sandberg and Meta's former VP of Global Affairs Nick Clegg joining the board.
Nvidia invests $4 billion in photonics companies Lumentum and Coherent
Nvidia announced Monday it is investing $2 billion each into photonics companies Lumentum and Coherent to develop optical transceivers, circuit switches, and lasers for next-generation AI data centers. The technology aims to improve energy efficiency, data transfer speeds, and bandwidth in data center infrastructure.
Meta signs multi-billion dollar TPU rental deal with Google, challenging Nvidia's chip dominance
Meta has signed a multi-billion dollar deal to rent Google's TPU (Tensor Processing Unit) chips for training its AI models, marking a significant shift away from Nvidia's dominance in AI infrastructure. The arrangement provides Meta with alternative compute capacity while signaling growing competition in the specialized AI chip market.
OpenAI closes $110B funding round from Amazon, Nvidia, SoftBank at $730B valuation
OpenAI has closed a $110 billion funding round with Amazon committing $50 billion, Nvidia $30 billion, and SoftBank $30 billion. The company is now valued at $730 billion, following a previous $40 billion round in 2025. The funding includes custom model development agreements between OpenAI and Amazon Web Services.
Nvidia reportedly planning $30 billion investment in OpenAI
Nvidia is reportedly planning a $30 billion investment in OpenAI, according to Reuters citing sources familiar with the matter. The deal would represent one of the largest funding commitments in the AI sector to date. Terms and timeline have not been officially confirmed by either company.