NVIDIA Nemotron-3-Nano-4B-GGUF

NVIDIA🇺🇸 United States
active
Context window262K tokens

Version History

v1.0major

Initial release of Nemotron-3-Nano-4B-GGUF, a quantized (Q4_K_M) 4B parameter edge model with hybrid Mamba-2 architecture. Supports controllable reasoning modes and 262K context window for edge AI applications including gaming NPCs and local voice assistants.

Coverage

model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-4B, a 4B parameter model for edge AI with 262K context window

NVIDIA released Nemotron-3-Nano-4B-GGUF on March 16, 2026, a 4-billion parameter small language model (SLM) designed for edge deployment on devices like Jetson Thor and GeForce RTX. The model features a hybrid Mamba-2 and Transformer architecture with a 262K token context window and supports both reasoning and non-reasoning modes via system prompts.

2 min read