Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor
Google DeepMind released Gemini 3.1 Flash-Lite, a model that generates functional websites in real time through a new pseudo-browser demo. The model achieves first response token 2.5 times faster than Gemini 2.5 Flash and outputs over 360 tokens per second, though output pricing has tripled from $0.40 to $1.50 per million tokens.
Gemini 3.1 Flash Lite — Quick Specs
Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor
Google DeepMind released Gemini 3.1 Flash-Lite with substantially improved inference speed. The model achieves first response token generation 2.5 times faster than Gemini 2.5 Flash and sustains output of over 360 tokens per second, according to Google.
The company demonstrated the capability through a new pseudo-browser interface: users input a text prompt describing a desired webpage, and the model renders HTML and CSS in real time as it generates. A live demo is available free in Google AI Studio.
Performance trade-offs
The speed gains come with significant cost increases. Output pricing has more than tripled to $1.50 per million tokens, up from $0.40 per million tokens on the prior Flash version. Input pricing was not disclosed in available sources.
The model's website generation output shows consistency issues. Generated pages begin rendering correctly but content "quickly drifts into nonsense," according to assessments of the demo. Google suggests tight guardrails could enable practical use cases such as rapid UI mockup creation for design visualization.
Competitive positioning
According to Artificial Analysis benchmarking, Gemini 3.1 Flash-Lite outperforms larger models including Claude Opus 4.6 on certain multimodal tasks, though comprehensive benchmark scores were not disclosed.
The model became available in Google AI Studio and Vertex AI starting in early March 2026.
What this means
Gemini 3.1 Flash-Lite prioritizes inference speed over output cost—a deliberate trade-off that positions the model for latency-sensitive applications where user-facing response time matters more than token expenses. The website generation capability remains a novelty demonstration rather than production-ready tooling, but the speed metrics signal Google's focus on competing in the low-latency inference market where models like Claude Sonnet and smaller specialized models have gained traction.
Related Articles
Google DeepMind releases Gemini 3.1 Flash TTS with audio tags for precise speech control across 70+ languages
Google DeepMind launched Gemini 3.1 Flash TTS, a text-to-speech model that achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard. The model introduces audio tags that allow developers to control vocal style, pace, and delivery through natural language commands embedded in text input, with support for 70+ languages.
Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction
Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.
Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters
Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.
Google preparing 'AI Ultra Lite' tier between $20 Pro and $250 Ultra plans, adding usage dashboard
Google is developing an intermediate subscription tier called 'AI Ultra Lite' to slot between its $20 Pro and $250 Ultra plans, according to code discovered in the Gemini macOS app. The company is also preparing a usage dashboard showing token budgets across five-hour and weekly limits.
Comments
Loading...