model release

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

TL;DR

Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2, requires no GPU, and claims competitive accuracy against models 7-20x larger.

May 10, 2026 · 11:05 AM2 min read

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

Technical Specifications

Parameters: 99 million across ONNX assets
Languages: 31 (expanded from 5 in Supertonic 2)
Inference: CPU-only via ONNX Runtime, no cloud calls required
Model type: Text-to-speech
License: OpenRAIL-M for model weights, MIT for sample code

Performance Claims

According to Supertone, Supertonic 3 achieves competitive word error rates (WER) and character error rates (CER) against larger open-source TTS models like VoxCPM2, which range from 0.7B to 2B parameters. The company provides benchmark comparisons showing the model runs faster on CPU than larger baselines measured on A100 GPU.

Supertonic 3 claims improvements over version 2 in three areas: reduced repeat and skip failures during reading, higher speaker similarity across shared languages, and the 6x expansion in language coverage.

New Features

Expression tags: Supports <laugh>, <breath>, and <sigh> tags for expressive synthesis
Improved stability: Fewer reading errors on both short and long text inputs
31 languages: English, Korean, Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Spanish, Estonian, Finnish, French, Hindi, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, Vietnamese

Deployment

The model ships as ONNX assets and runs through a Python SDK. Users can install via pip install supertonic and generate speech locally. The SDK auto-downloads model assets from Hugging Face on first run.

from supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")
wav, duration = tts.synthesize(text, voice_style=style, lang="en")

What This Means

Supertonic 3 targets the growing demand for privacy-preserving, on-device AI inference. At 99M parameters, the model is 7-20x smaller than comparable open TTS systems, making it practical for browser and edge deployment where GPU access is limited or unavailable. The CPU-only requirement and sub-100MB footprint address real constraints in mobile and embedded applications.

The 31-language support positions Supertonic 3 as a lightweight alternative to larger multilingual TTS systems. However, without independent benchmarks, it remains unclear how the model's accuracy-size tradeoff compares to cloud-based alternatives or other on-device TTS solutions across different hardware profiles and use cases.

Source: huggingface.co ↗

Supertone TTS text-to-speech on-device ONNX multilingual speech-synthesis lightweight-models

model releaseMay 10, 2026

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

model releaseMay 6, 2026

IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support

IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.

model releaseMay 6, 2026

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages

Technical Specifications

Performance Claims

New Features

Deployment

What This Means

Related Articles

Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Comments