Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor
Google DeepMind released Gemini 3.1 Flash-Lite, a model that generates functional websites in real time through a new pseudo-browser demo. The model achieves first response token 2.5 times faster than Gemini 2.5 Flash and outputs over 360 tokens per second, though output pricing has tripled from $0.40 to $1.50 per million tokens.
Gemini 3.1 Flash-Lite — Quick Specs
Google DeepMind's Gemini 3.1 Flash-Lite generates websites in real time, 2.5x faster than predecessor
Google DeepMind released Gemini 3.1 Flash-Lite with substantially improved inference speed. The model achieves first response token generation 2.5 times faster than Gemini 2.5 Flash and sustains output of over 360 tokens per second, according to Google.
The company demonstrated the capability through a new pseudo-browser interface: users input a text prompt describing a desired webpage, and the model renders HTML and CSS in real time as it generates. A live demo is available free in Google AI Studio.
Performance trade-offs
The speed gains come with significant cost increases. Output pricing has more than tripled to $1.50 per million tokens, up from $0.40 per million tokens on the prior Flash version. Input pricing was not disclosed in available sources.
The model's website generation output shows consistency issues. Generated pages begin rendering correctly but content "quickly drifts into nonsense," according to assessments of the demo. Google suggests tight guardrails could enable practical use cases such as rapid UI mockup creation for design visualization.
Competitive positioning
According to Artificial Analysis benchmarking, Gemini 3.1 Flash-Lite outperforms larger models including Claude Opus 4.6 on certain multimodal tasks, though comprehensive benchmark scores were not disclosed.
The model became available in Google AI Studio and Vertex AI starting in early March 2026.
What this means
Gemini 3.1 Flash-Lite prioritizes inference speed over output cost—a deliberate trade-off that positions the model for latency-sensitive applications where user-facing response time matters more than token expenses. The website generation capability remains a novelty demonstration rather than production-ready tooling, but the speed metrics signal Google's focus on competing in the low-latency inference market where models like Claude Sonnet and smaller specialized models have gained traction.
Related Articles
Google DeepMind releases Nano Banana 2 image model with Pro-level capabilities at faster speeds
Google DeepMind has released Nano Banana 2, an image generation model that combines advanced world knowledge and subject consistency with faster inference speeds comparable to its Flash offering. The model is positioned as production-ready with capabilities previously associated with Pro-tier performance.
Google Deepmind adds multi-tool chaining and context circulation to Gemini API
Google Deepmind has expanded the Gemini API to enable multi-tool chaining, allowing developers to combine built-in tools like Google Search and Google Maps with custom functions in a single request. Results from one tool now automatically pass to the next through context circulation, eliminating the need for separate sequential handling.
Google DeepMind argues chatbot ethics require same rigor as coding benchmarks
Google DeepMind is pushing for moral behavior in large language models to be evaluated with the same technical rigor applied to coding and math benchmarks. As LLMs take on roles like companions, therapists, and medical advisors, the research group argues current evaluation standards are insufficient.
Stable Diffusion 3.5 TensorRT optimization delivers 2x faster generation, 40% less VRAM on RTX GPUs
Stability AI has released TensorRT-optimized versions of the Stable Diffusion 3.5 model family in collaboration with NVIDIA. The optimization uses FP8 quantization to achieve 2x faster generation speed and 40% lower VRAM requirements on supported RTX GPUs.
Comments
Loading...