model releaseIbm

IBM Releases Granite Speech 4.1 2B: 2-Billion-Parameter Multilingual Speech Model with Non-Autoregressive Variant

TL;DR

IBM has released Granite Speech 4.1 2B, a 2-billion-parameter speech-language model trained on 174,000 hours of audio for automatic speech recognition and translation across English, French, German, Spanish, Portuguese, and Japanese. The model introduces a dual-head CTC encoder and includes variants for speaker attribution and a novel non-autoregressive architecture for higher throughput.

May 5, 2026 · 11:06 AM2 min read

Granite Speech 4.1 2B — Quick Specs

Context window2K tokens

Compare Granite Speech 4.1 2B with other models →

IBM Releases Granite Speech 4.1 2B: 2-Billion-Parameter Multilingual Speech Model with Non-Autoregressive Variant

IBM has released Granite Speech 4.1 2B, a 2-billion-parameter speech-language model designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). The model supports English, French, German, Spanish, Portuguese, and Japanese, and was trained on 174,000 hours of audio from public corpora and synthetic datasets.

Technical Architecture

The model was built by modality-aligning an intermediate checkpoint of granite-4.0-1b-base to speech. According to IBM, the new naming convention reflects actual parameter count rather than base LLM size. Key architectural improvements over the predecessor include:

Dual-head CTC encoder with both graphemic and BPE outputs
Frame importance sampling to focus on informative audio segments
Punctuation and truecasing across all supported languages, including German noun capitalization

IBM offers two additional variants: granite-speech-4.1-2b-plus adds speaker-attributed ASR and word-level timestamps, while granite-speech-4.1-2b-nar introduces a non-autoregressive architecture designed for higher throughput.

Benchmark Performance

IBM evaluated the model against other speech-language models under 8 billion parameters. On the Open ASR leaderboard (as of April 2026), the model demonstrates competitive performance across standard benchmarks.

For punctuation accuracy, the model achieved a punctuation error rate (PER) ranging from 3.66 on German (CV-DE) to 25.70 on LibriSpeech-clean. Capitalization F1 scores ranged from 89.71 to 99.50, with the highest score on German where noun capitalization is required.

The model's keyword list biasing capability was evaluated using F1 scores of transcribed keywords during ASR tasks, excluding common words. IBM reports improved recognition of names, acronyms, and technical jargon compared to inference without keyword biasing.

Training and Capabilities

The 174,000-hour training dataset included:

Public corpora for ASR and AST
Synthetic datasets for Japanese ASR
Data tailored for keyword-biased ASR and speech translation

Beyond the six primary languages for ASR and AST, the model claims support for English-to-Italian and English-to-Mandarin translation.

Integration and Licensing

The model is available under Apache 2.0 license and supported natively in transformers>=4.52.1. IBM provides integration examples for both transformers and vLLM deployment, including online and offline inference modes.

What This Means

IBM's release of three model variants—standard, speaker-attributed, and non-autoregressive—addresses different deployment scenarios from accuracy-focused to throughput-optimized applications. The dual-head CTC encoder and frame importance sampling represent architectural refinements aimed at improving multilingual ASR accuracy. The non-autoregressive variant is particularly notable as an alternative to standard autoregressive decoding for speech tasks. At 2 billion parameters, the model targets enterprise applications requiring on-premise deployment with moderate computational resources.

Source: huggingface.co ↗

IBM Granite Speech Recognition ASR Speech Translation Multilingual Open Source

model releaseMay 1, 2026

IBM Releases Granite 4.1 30B With 131K Context Window and Enhanced Tool-Calling

IBM released Granite 4.1 30B, a 30-billion parameter instruction-following model with a 131,072 token context window. The model scores 80.16 on MMLU 5-shot and 88.41 on HumanEval pass@1, with enhanced tool-calling capabilities following OpenAI's function definition schema.

model releaseApril 30, 2026

IBM Releases Granite 4.1 8B with 131K Context Window at $0.05/M Input Tokens

IBM has released Granite 4.1 8B, an 8-billion-parameter decoder-only language model with a 131,072-token context window. The model supports 12 languages and costs $0.05 per million input tokens and $0.10 per million output tokens, available under the Apache 2.0 license.

model releaseApril 30, 2026

IBM releases Granite 4.1-8B with 131K context window and enhanced tool-calling capabilities

IBM has released Granite 4.1-8B, an 8-billion parameter long-context model with a 131,072-token context window. The model achieves 85.37% on HumanEval and 73.84% on MMLU 5-shot, with enhanced tool-calling capabilities reaching 68.27% on BFCL v3. Released under Apache 2.0 license, it supports 12 languages.

model releaseMay 5, 2026

IBM releases Apache 2.0 Granite 4.1 LLMs in 3B, 8B, and 30B sizes

IBM has released the Granite 4.1 family of language models under Apache 2.0 license. The models come in 3B, 8B, and 30B parameter sizes. Unsloth has released 21 GGUF quantized variants of the 3B model ranging from 1.2GB to 6.34GB.

IBM Releases Granite Speech 4.1 2B: 2-Billion-Parameter Multilingual Speech Model with Non-Autoregressive Variant

Granite Speech 4.1 2B — Quick Specs

IBM Releases Granite Speech 4.1 2B: 2-Billion-Parameter Multilingual Speech Model with Non-Autoregressive Variant

Technical Architecture

Benchmark Performance

Training and Capabilities

Integration and Licensing

What This Means

Related Articles

IBM Releases Granite 4.1 30B With 131K Context Window and Enhanced Tool-Calling

IBM Releases Granite 4.1 8B with 131K Context Window at $0.05/M Input Tokens

IBM releases Granite 4.1-8B with 131K context window and enhanced tool-calling capabilities

IBM releases Apache 2.0 Granite 4.1 LLMs in 3B, 8B, and 30B sizes

Comments