IBM releases Granite 4.1-8B with 131K context window and enhanced tool-calling capabilities
IBM has released Granite 4.1-8B, an 8-billion parameter long-context model with a 131,072-token context window. The model achieves 85.37% on HumanEval and 73.84% on MMLU 5-shot, with enhanced tool-calling capabilities reaching 68.27% on BFCL v3. Released under Apache 2.0 license, it supports 12 languages.
Granite 4.1 8B — Quick Specs
IBM releases Granite 4.1-8B with 131K context window and enhanced tool-calling capabilities
IBM has released Granite 4.1-8B, an 8-billion parameter instruction-following model with a 131,072-token context window. The model was released on April 29, 2025, under an Apache 2.0 license.
Performance benchmarks
According to IBM, Granite 4.1-8B achieves the following scores:
- Code tasks: 85.37% on HumanEval pass@1, 87.30% on MBPP pass@1, 79.88% on HumanEval+ pass@1
- General tasks: 73.84% on MMLU 5-shot, 80.51% on BBH 3-shot with chain-of-thought
- Math tasks: 92.49% on GSM8K 8-shot, 80.10% on Minerva Math 0-shot with CoT
- Tool-calling: 68.27% on BFCL v3
- Alignment: 87.06% on IFEval average
The model is part of a three-model family including 3B and 30B parameter versions.
Technical specifications
Granite 4.1-8B uses a decoder-only dense transformer architecture with:
- 4,096 embedding size
- 40 layers
- 32 attention heads with 8 key-value heads
- Grouped Query Attention (GQA)
- RoPE positional embeddings
- SwiGLU activation in MLP layers
- 12,800 MLP hidden size
Training and capabilities
IBM trained the model on a combination of open source instruction datasets with permissive licenses and internally generated synthetic data. The post-training pipeline included supervised fine-tuning and reinforcement learning alignment.
The model supports 12 languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. According to IBM, it can be fine-tuned for additional languages.
Key capabilities include:
- Text summarization and classification
- Question-answering and RAG
- Code generation and completion
- Function calling with OpenAI-compatible tool definitions
- Fill-in-the-middle code completions
- Multilingual dialog
The model achieves 64.84% on MMMLU 5-shot across 11 languages and 58.89% on INCLUDE 5-shot across 14 languages.
Safety benchmarks
IBM reports safety scores of 95.80% on SALAD-Bench and 81.19% on AttaQ for the 8B model.
Availability
The model is available on Hugging Face under the Apache 2.0 license. Pricing for API access has not been disclosed. IBM provides code examples for both basic text generation and tool-calling use cases using the Transformers library.
What this means
Granite 4.1-8B represents IBM's push into the competitive 8B parameter model space with strong code performance and a notably large 131K context window. The Apache 2.0 license and multilingual support position it as an alternative to models like Llama 3.1 8B and Mistral 7B for enterprises requiring permissive licensing. The tool-calling improvements and comprehensive benchmark suite suggest IBM is targeting production AI assistant deployments, though actual inference costs and API availability remain unclear.
Related Articles
Anthropic releases Claude Fable 5, first public version of Mythos model for code generation
Anthropic has released Claude Fable 5, the first publicly available version of its Mythos model line. University of Pennsylvania AI researcher Ethan Mollick reports the model can execute multi-page specifications for up to 12 hours and generate complete video games from single prompts in Claude Code.
Google DeepMind releases Gemma 4 12B: encoder-free multimodal model runs on 16GB RAM
Google DeepMind has released Gemma 4 12B, a 12-billion parameter multimodal model that runs locally on laptops with 16GB of RAM. The model eliminates separate vision and audio encoders, processing raw inputs directly through its language model backbone under an Apache 2.0 license.
White House forces Anthropic to pull Fable 5 AI model after Amazon security report
Anthropic's Fable 5 AI model was pulled from public access Friday night after Amazon reported security vulnerabilities to the White House. The administration imposed export controls on Anthropic's Mythos-class models just days after the June 9 release.
MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup
MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.
Comments
Loading...