Granite Speech 4.1 2B

active
Context window2K tokens

Version History

4.1minor

Granite Speech 4.1 2B introduces dual-head CTC encoder, frame importance sampling, improved multilingual ASR accuracy, and punctuation/truecasing across all languages. Two new variants add speaker attribution with timestamps and non-autoregressive architecture.

Coverage

model releaseIbm

IBM Releases Granite Speech 4.1 2B: 2-Billion-Parameter Multilingual Speech Model with Non-Autoregressive Variant

IBM has released Granite Speech 4.1 2B, a 2-billion-parameter speech-language model trained on 174,000 hours of audio for automatic speech recognition and translation across English, French, German, Spanish, Portuguese, and Japanese. The model introduces a dual-head CTC encoder and includes variants for speaker attribution and a novel non-autoregressive architecture for higher throughput.

2 min read