Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages
Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2, requires no GPU, and claims competitive accuracy against models 7-20x larger.
Supertone releases Supertonic 3: 99M-parameter on-device TTS model supporting 31 languages
Supertone has released Supertonic 3, a 99M-parameter text-to-speech model that runs entirely on-device using ONNX Runtime. The model expands language support from 5 to 31 languages compared to Supertonic 2 and requires no GPU for inference.
Technical Specifications
- Parameters: 99 million across ONNX assets
- Languages: 31 (expanded from 5 in Supertonic 2)
- Inference: CPU-only via ONNX Runtime, no cloud calls required
- Model type: Text-to-speech
- License: OpenRAIL-M for model weights, MIT for sample code
Performance Claims
According to Supertone, Supertonic 3 achieves competitive word error rates (WER) and character error rates (CER) against larger open-source TTS models like VoxCPM2, which range from 0.7B to 2B parameters. The company provides benchmark comparisons showing the model runs faster on CPU than larger baselines measured on A100 GPU.
Supertonic 3 claims improvements over version 2 in three areas: reduced repeat and skip failures during reading, higher speaker similarity across shared languages, and the 6x expansion in language coverage.
New Features
- Expression tags: Supports
<laugh>,<breath>, and<sigh>tags for expressive synthesis - Improved stability: Fewer reading errors on both short and long text inputs
- 31 languages: English, Korean, Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Spanish, Estonian, Finnish, French, Hindi, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, Vietnamese
Deployment
The model ships as ONNX assets and runs through a Python SDK. Users can install via pip install supertonic and generate speech locally. The SDK auto-downloads model assets from Hugging Face on first run.
from supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")
wav, duration = tts.synthesize(text, voice_style=style, lang="en")
What This Means
Supertonic 3 targets the growing demand for privacy-preserving, on-device AI inference. At 99M parameters, the model is 7-20x smaller than comparable open TTS systems, making it practical for browser and edge deployment where GPU access is limited or unavailable. The CPU-only requirement and sub-100MB footprint address real constraints in mobile and embedded applications.
The 31-language support positions Supertonic 3 as a lightweight alternative to larger multilingual TTS systems. However, without independent benchmarks, it remains unclear how the model's accuracy-size tradeoff compares to cloud-based alternatives or other on-device TTS solutions across different hardware profiles and use cases.
Related Articles
Google DeepMind Releases Gemma 4 E4B with Multi-Token Prediction for 2x Faster Inference
Google DeepMind released the Gemma 4 E4B assistant model using Multi-Token Prediction (MTP) architecture that accelerates inference by up to 2x through speculative decoding. The 4.5B effective parameter model supports 128K context windows and handles text, image, and audio input with pricing not yet disclosed.
Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks
Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.
IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support
IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.
Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters
Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.
Comments
Loading...