Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor

TL;DR

Google DeepMind released Gemini 3.1 Flash-Lite, a model that generates functional websites in real time through a new pseudo-browser demo. The model achieves first response token 2.5 times faster than Gemini 2.5 Flash and outputs over 360 tokens per second, though output pricing has tripled from $0.40 to $1.50 per million tokens.

March 24, 2026 · 7:20 PM1 min read

Gemini 3.1 Flash Lite — Quick Specs

Context window1049K tokens

Input$0.25/1M tokens

Output$1.5/1M tokens

Compare Gemini 3.1 Flash Lite with other models →

Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor

Google DeepMind released Gemini 3.1 Flash-Lite with substantially improved inference speed. The model achieves first response token generation 2.5 times faster than Gemini 2.5 Flash and sustains output of over 360 tokens per second, according to Google.

The company demonstrated the capability through a new pseudo-browser interface: users input a text prompt describing a desired webpage, and the model renders HTML and CSS in real time as it generates. A live demo is available free in Google AI Studio.

Performance trade-offs

The speed gains come with significant cost increases. Output pricing has more than tripled to $1.50 per million tokens, up from $0.40 per million tokens on the prior Flash version. Input pricing was not disclosed in available sources.

The model's website generation output shows consistency issues. Generated pages begin rendering correctly but content "quickly drifts into nonsense," according to assessments of the demo. Google suggests tight guardrails could enable practical use cases such as rapid UI mockup creation for design visualization.

Competitive positioning

According to Artificial Analysis benchmarking, Gemini 3.1 Flash-Lite outperforms larger models including Claude Opus 4.6 on certain multimodal tasks, though comprehensive benchmark scores were not disclosed.

The model became available in Google AI Studio and Vertex AI starting in early March 2026.

What this means

Gemini 3.1 Flash-Lite prioritizes inference speed over output cost—a deliberate trade-off that positions the model for latency-sensitive applications where user-facing response time matters more than token expenses. The website generation capability remains a novelty demonstration rather than production-ready tooling, but the speed metrics signal Google's focus on competing in the low-latency inference market where models like Claude Sonnet and smaller specialized models have gained traction.

Source: the-decoder.com ↗

gemini google-deepmind model-release inference-speed real-time-generation pricing multimodal

model releaseApril 15, 2026

Google DeepMind releases Gemini 3.1 Flash TTS with audio tags for precise speech control across 70+ languages

Google DeepMind launched Gemini 3.1 Flash TTS, a text-to-speech model that achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard. The model introduces audio tags that allow developers to control vocal style, pace, and delivery through natural language commands embedded in text input, with support for 70+ languages.

model releaseMay 6, 2026

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.

model releaseMay 6, 2026

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

changelogMay 5, 2026

Google preparing 'AI Ultra Lite' tier between $20 Pro and $250 Ultra plans, adding usage dashboard

Google is developing an intermediate subscription tier called 'AI Ultra Lite' to slot between its $20 Pro and $250 Ultra plans, according to code discovered in the Gemini macOS app. The company is also preparing a usage dashboard showing token budgets across five-hour and weekly limits.

Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor

Gemini 3.1 Flash Lite — Quick Specs

Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor

Performance trade-offs

Competitive positioning

What this means

Related Articles

Google DeepMind releases Gemini 3.1 Flash TTS with audio tags for precise speech control across 70+ languages

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google preparing 'AI Ultra Lite' tier between $20 Pro and $250 Ultra plans, adding usage dashboard

Comments