Granite 4.1 8B

active
Context window512K tokens

Version History

4.1major

Granite 4.1 introduces 8B dense model that matches previous 32B MoE performance, trained on 15T tokens with five-phase pipeline and extended to 512K context.

Coverage

model releaseIbm

IBM's Granite 4.1: 8B Dense Model Matches 32B MoE Performance on 15T Tokens

IBM released Granite 4.1, a family of dense decoder-only LLMs (3B, 8B, 30B parameters) trained on approximately 15 trillion tokens using a five-phase pre-training pipeline. The 8B instruct model matches or surpasses the previous Granite 4.0-H-Small (32B-A9B MoE) despite using fewer parameters and a simpler dense architecture. All models support up to 512K context windows and are released under Apache 2.0 license.

3 min read