NVIDIA Nemotron-3-Nano-4B-GGUF

Name: NVIDIA Nemotron-3-Nano-4B-GGUF
Author: NVIDIA

NVIDIA🇺🇸 United States

active

Compare with other models →

Context window262K tokens

Version History

v1.0majorMarch 16, 2026

Initial release of Nemotron-3-Nano-4B-GGUF, a quantized (Q4_K_M) 4B parameter edge model with hybrid Mamba-2 architecture. Supports controllable reasoning modes and 262K context window for edge AI applications including gaming NPCs and local voice assistants.

Coverage

model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-4B, a 4B parameter model for edge AI with 262K context window

NVIDIA released Nemotron-3-Nano-4B-GGUF on March 16, 2026, a 4-billion parameter small language model (SLM) designed for edge deployment on devices like Jetson Thor and GeForce RTX. The model features a hybrid Mamba-2 and Transformer architecture with a 262K token context window and supports both reasoning and non-reasoning modes via system prompts.

March 23, 2026 · 3:36 PM2 min read

NVIDIA Nemotron small-language-model