benchmark

ElevenLabs and Google lead Artificial Analysis speech-to-text benchmark

TL;DR

Artificial Analysis has released an updated speech-to-text benchmark showing ElevenLabs and Google as top performers. The benchmark provides comparative analysis of current speech recognition systems across multiple models.

1 min read
0

ElevenLabs and Google Lead Updated Speech-to-Text Benchmark

Artificial Analysis has released an updated speech-to-text benchmark, with ElevenLabs and Google emerging as the dominant performers in speech recognition accuracy and reliability.

The benchmark evaluation tests current speech-to-text systems across multiple dimensions, comparing how well different models convert spoken audio into written text. Both ElevenLabs and Google demonstrate competitive performance at the top tier of the rankings.

Benchmark Details

The updated benchmark from Artificial Analysis provides comparative metrics across speech recognition providers. However, specific accuracy scores, model names tested, dataset composition, and detailed performance differentials between ElevenLabs and Google have not been disclosed in available sources.

ElevenLabs, primarily known for text-to-speech synthesis, has expanded into speech recognition capabilities. Google's speech-to-text service has been a standard offering through Google Cloud and native Android/web integration for years.

What This Means

The benchmark reinforces that speech recognition quality remains concentrated among well-resourced companies with large training datasets. ElevenLabs' competitive positioning in both directions of audio-text conversion suggests the company is building comprehensive speech processing capabilities. Google's continued dominance reflects its extensive audio data access through YouTube, Google Assistant, and cloud service users.

For developers and enterprises selecting speech-to-text providers, this benchmark offers independent evaluation data to guide integration decisions. The full benchmark details would be critical for understanding which system performs better in specific use cases (noise conditions, language support, latency requirements, accuracy thresholds).

Access the full Artificial Analysis benchmark for detailed scoring metrics and comparative analysis across all tested providers.

Related Articles

benchmark

ServiceNow Releases First Code-Switching ASR Benchmark: ElevenLabs Scribe V2 Leads with Lowest WER Across Four Language

ServiceNow released AU-Harness, the first comprehensive benchmark for code-switched speech recognition in enterprise voice agents, testing seven ASR systems including ElevenLabs, Gemini, and AssemblyAI. The benchmark covers 918 utterances across Spanish-English, French-English, Canadian French-English, and German-English, measuring Word Error Rate (WER), Semantic WER (SWER), and Answer Error Rate (AER). ElevenLabs Scribe V2 achieved the lowest WER across all language pairs, followed closely by AssemblyAI Universal-3 Pro.

benchmark

Gemini 3.5 Flash ranks 6th in Android coding benchmark at 3x cost of Gemini 3.1 Pro

Google's latest Android Bench results show Gemini 3.5 Flash ranking 6th with a 63.7% success rate, despite averaging $147.10 per benchmark run compared to Gemini 3.1 Pro Preview's $47.90. The newer model used 355.9 tokens per run versus 73.3 for its predecessor, while GPT 5.5 leads the benchmark at 74% success rate.

benchmark

ChatGPT Images 2.0 scores 97% in head-to-head image generation benchmark against Google's Gemini Nano Banana at 85%

OpenAI's ChatGPT Images 2.0 scored 97% versus Google's Gemini Nano Banana at 85% in a nine-test image generation benchmark conducted by ZDNET. The tests measured capabilities including image restoration, text rendering, and prompt adherence, with Nano Banana losing points primarily for fabricating details and text errors.

benchmark

OpenAI's GPT 5.4 ties Gemini 3.1 Pro at 72.4% on Google's Android coding benchmark

Google's Android Bench—a benchmark measuring AI model performance for Android app development—shows OpenAI's GPT 5.4 and Google's Gemini 3.1 Pro Preview tied at 72.4% in the latest April 2026 update. OpenAI's GPT 5.3-Codex ranks third at 67.7%, while Anthropic's Claude Opus 4.6 scores 66.6%.

Comments

Loading...