model releaseZhipu AI

Zhipu AI releases GLM-5V-Turbo: multimodal model generates front-end code from design mockups

TL;DR

Zhipu AI released GLM-5V-Turbo, a multimodal coding model that converts design mockups directly into executable front-end code. The model processes images, video, and text with a 200,000-token context window and 128,000-token max output, priced at $1.20 per million input tokens and $4 per million output tokens.

3 min read
0

GLM-5V-Turbo — Quick Specs

Context window200K tokens
Input$1.2/1M tokens
Output$4/1M tokens

Zhipu AI Releases GLM-5V-Turbo: Multimodal Model Converts Design Mockups to Code

Zhipu AI has released GLM-5V-Turbo, a multimodal coding base model that generates executable front-end code directly from design mockups, images, and video inputs. The model is purpose-built for agent workflows and available via API at $1.20 per million input tokens and $4 per million output tokens—pricing identical to the text-only GLM-5-Turbo.

Core Specifications

GLM-5V-Turbo processes multimodal inputs through a proprietary vision encoder called CogViT, integrated directly into the architecture rather than bolted on after training. The model features:

  • Context window: 200,000 tokens
  • Maximum output: 128,000 tokens
  • Key features: Thinking mode, streaming output, function calling, and context caching
  • Availability: API-only through Z.AI platform; no open weights announced

Architecture and Training

Zhipu AI claims performance gains stem from four improvements: integrated architecture that processes images and text together from training start; a new vision encoder (CogViT); multi-token prediction during inference for faster output; and reinforcement learning across 30+ task types including STEM, grounding, video, GUI agents, and coding agents.

The company constructed a multi-level, controllable data system to address agent training data shortages, with agentic meta-skills embedded in pre-training. A multimodal toolchain extends capabilities from text to visual interaction, including box drawing, screenshots, website reading, and image understanding.

Claimed Benchmark Performance

According to Zhipu AI, GLM-5V-Turbo leads in most multimodal coding and tool usage benchmarks. The model reportedly scores well on:

  • Design-to-code generation and visual code generation
  • Multimodal search and visual exploration
  • AndroidWorld and WebVoyager (real GUI navigation benchmarks)
  • PinchBench, ClawEval, and ZClawBench (task execution quality)

Clause Opus 4.6 reportedly outperforms GLM-5V-Turbo on some benchmarks including Flame-VLM-Code and OSWorld. In text-only coding tasks, the company claims no performance drop despite added visual capabilities, maintaining strength across CC-Bench-V2 (backend, frontend, repo exploration) while outperforming its text-only predecessor GLM-5-Turbo and competitors Kimi K2.5 in several categories.

Important note: Independent evaluations are still pending. All performance claims come directly from Zhipu AI.

Use Cases

GLM-5V-Turbo targets specific workflows:

  1. Design-to-code: Converts design mockups into complete, runnable front-end projects with pixel-perfect visual consistency
  2. Autonomous GUI exploration: Paired with Claude Code or OpenClaw, the model can search websites independently, map page transitions, collect visual assets, and write code
  3. Debugging: Screenshots broken pages, identifies rendering issues (layout shifts, overlaps, color mismatches), and generates fixes

The model integrates with OpenClaw agent framework and includes official skills like image captioning, visual grounding, document writing, resume screening, and prompt generation via ClawHub.

Context: GLM-5 Lineage

GLM-5V-Turbo builds on Zhipu AI's recent releases. GLM-5-Turbo (text-only) launched for the OpenClaw ecosystem, improving tool calls and long task chain execution. Before that, GLM-5—an open-source 744-billion-parameter model under MIT license—launched in February. According to Zhipu, GLM-5 achieved 77.8% on SWE-bench Verified (compared to Claude Opus 4.5's 80.9%) and runs on Huawei chips alongside Nvidia GPUs, an advantage given US export restrictions on semiconductors to China.

What This Means

GLM-5V-Turbo represents a direct technical pivot toward vision-integrated code generation, eliminating the intermediate step of converting design visuals to text descriptions before coding. The model's integration into agent frameworks (Claude Code, OpenClaw) and matching API pricing to text-only models signals Zhipu AI's confidence in visual capabilities not degrading pure text performance. However, performance claims remain unvalidated by independent benchmarking. The design-to-code capability specifically targets a concrete workflow gap in front-end development, though real-world execution quality (pixel accuracy, responsive design handling) requires independent verification beyond company claims.

Related Articles

benchmark

China's Zhipu AI releases GLM-5.2, claims parity with Mythos on cybersecurity benchmarks

Zhipu AI released its open-weight GLM-5.2 model, with researchers claiming it matches Anthropic's Mythos on certain bug-finding and cybersecurity tasks. The model lags behind Anthropic and OpenAI models on general benchmarks but represents a significant narrowing of capabilities between Chinese and US AI systems.

model release

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model release

NVIDIA releases Nemotron-Labs-TwoTower-30B: block-wise diffusion model claims 2.42× faster generation at 98.7% baseline

NVIDIA released Nemotron-Labs-TwoTower-30B-A3B-Base-BF16, a block-wise diffusion language model that generates text by denoising blocks of tokens in parallel rather than sequentially. According to NVIDIA, the model achieves 2.42× the wall-clock generation throughput of its autoregressive baseline while retaining 98.7% of aggregate benchmark quality.

model release

Mistral Releases Leanstral 1.5: 6B-Parameter Model Achieves 100% on miniF2F, Solves 587/672 PutnamBench Problems

Mistral AI released Leanstral 1.5, a free Apache-2.0 licensed model with 119B total parameters and 6B active parameters specialized for formal verification in Lean 4. The model achieves 100% on miniF2F benchmark, solves 587 of 672 PutnamBench problems at $4 per problem (versus $300+ for competitors), and reaches state-of-the-art 87% on FATE-H and 34% on FATE-X benchmarks.

Comments

Loading...