model releaseCohere

Cohere releases 2B open-source speech model with 5.42% word error rate

TL;DR

Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition model that the company claims tops the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The model supports 14 languages and is available under Apache 2.0 license, outperforming OpenAI's Whisper Large v3 and competing models on both accuracy and throughput metrics.

1 min read
0

Cohere releases 2B open-source speech model with 5.42% word error rate

Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition (ASR) model. According to the company, it achieves a 5.42% average word error rate on the Hugging Face Open ASR Leaderboard, claiming the top position ahead of OpenAI's Whisper Large v3, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B.

Model specifications and performance

Transcribe supports 14 languages including English, German, French, and Japanese. Cohere claims the model also delivers the best throughput among similarly-sized competitors, a critical metric for production deployments where both latency and accuracy matter.

The model is available for download under the Apache 2.0 open-source license from Hugging Face, making it freely usable for commercial and non-commercial applications. It can also be accessed through Cohere's API and the company's Model Vault platform for users preferring cloud-based inference.

Deployment and integration plans

Cohere plans to integrate Transcribe into its North AI agent platform in the future. The release positions Transcribe as a viable alternative to proprietary speech models for developers and organizations seeking open-source ASR capabilities.

The 2B parameter size represents a middle ground: larger than many mobile-optimized models but smaller than massive academic benchmarks, suggesting practical hardware requirements for deployment.

What this means

Cohere's Transcribe release adds competition to the speech recognition market where Whisper has dominated open-source discussions. The claimed 5.42% WER and Apache 2.0 licensing remove licensing friction for commercial applications. However, independent verification of leaderboard results is essential—published benchmarks sometimes reflect narrow test conditions rather than real-world performance across diverse audio conditions, accents, and domains. The model's multilingual support and throughput claims warrant direct comparison testing before large-scale adoption decisions.

Related Articles

model release

OpenAI's ChatGPT 5.6 release restricted to government-approved customers initially

OpenAI will release ChatGPT 5.6 first to customers approved by the federal government, according to a staff memo from CEO Sam Altman. The company plans a broader release "a couple of weeks later," marking a significant departure from typical model rollouts.

model release

China's Z.ai releases GLM-5.2, open-source model matching Claude and GPT-5.5 in cybersecurity tasks

Z.ai's GLM-5.2 performs on par with Claude Opus 4.8 and OpenAI's GPT-5.5 in cybersecurity benchmarks while costing roughly half as much to run. Security evaluations from Graphistry and Semgrep confirm the open-weight model's capabilities in vulnerability discovery and cyber investigation, raising concerns about accessibility of advanced hacking tools.

model release

DeepSeek-V4-Fable: Offensive Security Model Trained on 80,000 CTF Trajectories Achieves 58.7% Solve Rate

Chunjiang Intelligence has released DeepSeek-V4-Fable, an autonomous agent model designed for offensive security research and CTF challenges. The model, distilled from Claude-5-Fable and built on DeepSeek-V4-Flash, was trained on 80,000 verified CTF trajectories and achieves a 58.7% solve rate across held-out security challenges.

model release

Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR

Baidu has released Unlimited-OCR, a 3 billion parameter model for optical character recognition and document parsing. The model supports single-page and multi-page document processing with a 32,768 token context window and runs on NVIDIA GPUs using bfloat16 precision.

Comments

Loading...