IBM Releases Granite 4.1 8B with 131K Context Window at $0.05/M Input Tokens
IBM has released Granite 4.1 8B, an 8-billion-parameter decoder-only language model with a 131,072-token context window. The model supports 12 languages and costs $0.05 per million input tokens and $0.10 per million output tokens, available under the Apache 2.0 license.
Granite 4.1 8B — Quick Specs
IBM Releases Granite 4.1 8B with 131K Context Window at $0.05/M Input Tokens
IBM has released Granite 4.1 8B, an 8-billion-parameter decoder-only language model with a 131,072-token context window, priced at $0.05 per million input tokens and $0.10 per million output tokens.
Model Specifications
Granite 4.1 8B is a dense transformer model with 8 billion parameters, released on April 30, 2026. The model supports a context window of 131,072 tokens, positioning it in the long-context model category alongside competitors like Claude and GPT-4.
The model is distributed under the Apache 2.0 license, making it available for both commercial and research use without licensing restrictions.
Capabilities
Granite 4.1 8B targets enterprise use cases with several specific features:
- Tool calling: Implements OpenAI-compatible function calling for integration with external systems
- Code generation: Includes fill-in-the-middle support for code completion tasks
- RAG support: Designed for retrieval-augmented generation workflows
- Text processing: Handles summarization, classification, and extraction tasks
The model supports 12 languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
Deployment
IBM is distributing Granite 4.1 8B through OpenRouter, which provides routing to multiple infrastructure providers. Model weights are publicly available for self-hosting.
The pricing structure of $0.05 per million input tokens and $0.10 per million output tokens places it in the mid-range pricing tier for models of this size, comparable to other 8B-parameter models from companies like Meta and Mistral.
What This Means
Granite 4.1 8B represents IBM's continued investment in open-source enterprise AI, offering a permissively licensed alternative to proprietary models. The 131K context window is notably large for an 8B-parameter model, though actual performance at that context length remains to be independently verified. The Apache 2.0 license and multi-language support make it a viable option for enterprises requiring on-premises deployment or specific regulatory compliance, particularly in the 12 supported language markets.
Related Articles
IBM releases Granite 4.1-8B with 131K context window and enhanced tool-calling capabilities
IBM has released Granite 4.1-8B, an 8-billion parameter long-context model with a 131,072-token context window. The model achieves 85.37% on HumanEval and 73.84% on MMLU 5-shot, with enhanced tool-calling capabilities reaching 68.27% on BFCL v3. Released under Apache 2.0 license, it supports 12 languages.
IBM's Granite 4.1: 8B Dense Model Matches 32B MoE Performance on 15T Tokens
IBM released Granite 4.1, a family of dense decoder-only LLMs (3B, 8B, 30B parameters) trained on approximately 15 trillion tokens using a five-phase pre-training pipeline. The 8B instruct model matches or surpasses the previous Granite 4.0-H-Small (32B-A9B MoE) despite using fewer parameters and a simpler dense architecture. All models support up to 512K context windows and are released under Apache 2.0 license.
IBM releases Bob AI coding assistant after testing on 80,000 employees, claims 45% productivity gains
IBM has launched Bob, its AI coding assistant, following internal testing with 80,000 employees. The company claims teams saw average productivity gains of 45% across complex workflows. Pricing ranges from $20 to $200 per month using a "Bobcoin" credit system.
Xiaomi releases MiMo-V2.5: 310B parameter omnimodal model with 1M token context window
Xiaomi released MiMo-V2.5, a 310B total parameter sparse mixture-of-experts model that activates 15B parameters per token. The omnimodal model supports text, image, video, and audio understanding with a 1M token context window and was trained on 48T tokens using FP8 mixed precision.
Comments
Loading...