Name: gpt-oss-puzzle-88B
Author: NVIDIA

Version History

v1.0majorMarch 26, 2026

NVIDIA released gpt-oss-puzzle-88B, an inference-optimized 88B-parameter mixture-of-experts model derived from gpt-oss-120B using the Puzzle NAS framework. Achieves 1.63× throughput improvement on long-context and up to 2.82× on single H100s while maintaining parent accuracy through heterogeneous expert pruning, selective window attention, and knowledge distillation with RL optimization.

Coverage

model releaseNVIDIA

NVIDIA releases gpt-oss-puzzle-88B, 88B-parameter reasoning model with 1.63× throughput gains

NVIDIA released gpt-oss-puzzle-88B on March 26, 2026, a 88-billion parameter mixture-of-experts model optimized for inference efficiency on H100 hardware. Built using the Puzzle post-training neural architecture search framework, the model achieves 1.63× throughput improvement in long-context (64K/64K) scenarios and up to 2.82× improvement on single H100 GPUs compared to its parent gpt-oss-120B, while matching or exceeding accuracy across reasoning effort levels.

March 28, 2026 · 2:20 PM2 min read

nvidia model-release mixture-of-experts