Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor
Google DeepMind released Gemini 3.1 Flash-Lite, a model that generates functional websites in real time through a new pseudo-browser demo. The model achieves first response token 2.5 times faster than Gemini 2.5 Flash and outputs over 360 tokens per second, though output pricing has tripled from $0.40 to $1.50 per million tokens.
Gemini 3.1 Flash Lite — Quick Specs
Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor
Google DeepMind released Gemini 3.1 Flash-Lite with substantially improved inference speed. The model achieves first response token generation 2.5 times faster than Gemini 2.5 Flash and sustains output of over 360 tokens per second, according to Google.
The company demonstrated the capability through a new pseudo-browser interface: users input a text prompt describing a desired webpage, and the model renders HTML and CSS in real time as it generates. A live demo is available free in Google AI Studio.
Performance trade-offs
The speed gains come with significant cost increases. Output pricing has more than tripled to $1.50 per million tokens, up from $0.40 per million tokens on the prior Flash version. Input pricing was not disclosed in available sources.
The model's website generation output shows consistency issues. Generated pages begin rendering correctly but content "quickly drifts into nonsense," according to assessments of the demo. Google suggests tight guardrails could enable practical use cases such as rapid UI mockup creation for design visualization.
Competitive positioning
According to Artificial Analysis benchmarking, Gemini 3.1 Flash-Lite outperforms larger models including Claude Opus 4.6 on certain multimodal tasks, though comprehensive benchmark scores were not disclosed.
The model became available in Google AI Studio and Vertex AI starting in early March 2026.
What this means
Gemini 3.1 Flash-Lite prioritizes inference speed over output cost—a deliberate trade-off that positions the model for latency-sensitive applications where user-facing response time matters more than token expenses. The website generation capability remains a novelty demonstration rather than production-ready tooling, but the speed metrics signal Google's focus on competing in the low-latency inference market where models like Claude Sonnet and smaller specialized models have gained traction.
Related Articles
Google DeepMind releases Gemma 4 12B: encoder-free multimodal model runs on 16GB RAM
Google DeepMind has released Gemma 4 12B, a 12-billion parameter multimodal model that runs locally on laptops with 16GB of RAM. The model eliminates separate vision and audio encoders, processing raw inputs directly through its language model backbone under an Apache 2.0 license.
Google AI Plus drops to $4.99/month with 400GB storage, down from $7.99
Google reduced its AI Plus subscription from $7.99 to $4.99 per month and doubled storage from 200GB to 400GB. The plan includes 2x higher Gemini usage limits with a 128,000 token context window, along with features like daily briefs and video generation.
Google DeepMind Releases Gemma 4: Encoder-Free Multimodal Models from 2.3B to 30.7B Parameters
Google DeepMind released Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 30.7B parameters. The flagship 12B Unified model eliminates separate encoders, processing text, images, audio, and video directly through a single decoder-only transformer with up to 256K token context window.
Google DeepMind releases Gemma 4 12B Unified: encoder-free multimodal model with 256K context window
Google DeepMind has released Gemma 4 12B Unified, an encoder-free multimodal model that processes text, images, and audio through a single decoder-only transformer. The model features 11.95 billion parameters, a 256K token context window, and achieves 77.2% on MMLU Pro and 72.0% on LiveCodeBench v6.
Comments
Loading...