model release

Baidu releases ERNIE-Image-Turbo, a distilled text-to-image model generating in 8 inference steps

TL;DR

Baidu has released ERNIE-Image-Turbo, a distilled text-to-image diffusion transformer that generates images in 8 inference steps. The model runs on consumer GPUs with 24GB VRAM and supports resolutions up to 1376×768, with claimed strengths in text rendering and structured generation tasks.

2 min read
0

Baidu releases ERNIE-Image-Turbo, a distilled text-to-image model generating in 8 inference steps

Baidu has released ERNIE-Image-Turbo, a distilled version of its ERNIE-Image text-to-image model that generates images in 8 inference steps. The model is built on a single-stream Diffusion Transformer (DiT) architecture and runs on consumer GPUs with 24GB VRAM.

Technical specifications

ERNIE-Image-Turbo supports multiple resolutions: 1024×1024, 848×1264, 1264×848, 768×1376, 896×1200, 1376×768, and 1200×896. The model uses a guidance scale of 1.0 and operates with bfloat16 precision. Pricing has not been disclosed.

The distillation process used Distribution Matching Distillation (DMD) and reinforcement learning to reduce the 50-step inference requirement of the base ERNIE-Image model to 8 steps while maintaining generation quality, according to Baidu.

Benchmark performance

On GENEval, ERNIE-Image-Turbo with prompt enhancement scored 0.8510 overall, compared to 0.8728 for the base ERNIE-Image model and 0.8481 for FLUX.2-klein-9B. The model achieved 0.9938 on single object detection and 0.8375 on counting tasks.

For text rendering measured on LongTextBench, ERNIE-Image-Turbo scored 0.9655 average across English and Chinese benchmarks, trailing Seedream 4.5 (0.9882) and the base ERNIE-Image model (0.9733) but outperforming FLUX.2-klein-9B (0.5413).

On the OneIG-EN benchmark measuring alignment, text, reasoning, style, and diversity, ERNIE-Image-Turbo scored 0.5656 overall. Nano Banana 2.0 led with 0.5780, while the base ERNIE-Image achieved 0.5750.

Implementation details

The model is available through Hugging Face's diffusers library and SGLang for deployment. Baidu states the model is designed for "posters, comics, multi-panel layouts, and other content creation tasks" requiring text rendering and structured generation.

Two versions are available: ERNIE-Image-Turbo with and without prompt enhancement (PE). The PE version generally shows higher benchmark scores across most metrics.

What this means

ERNIE-Image-Turbo represents Baidu's entry into fast text-to-image generation, prioritizing deployment efficiency over maximum quality. The 8-step generation and 24GB VRAM requirement make it accessible for consumer hardware, though benchmark scores indicate trade-offs compared to the base model. The focus on text rendering and structured layouts positions it for specific use cases like poster and comic generation rather than general-purpose image synthesis. Whether the speed gains justify the quality reduction will depend on application requirements.

Related Articles

model release

Baidu releases ERNIE-Image, an 8B parameter text-to-image model with strong text rendering capabilities

Baidu has released ERNIE-Image, an 8B parameter text-to-image generation model built on a single-stream Diffusion Transformer architecture. The model is designed for complex instruction following, text rendering, and structured image generation, and can run on consumer GPUs with 24GB VRAM.

model release

OpenAI releases GPT-5.4-Cyber, a fine-tuned variant for defensive cybersecurity work

OpenAI has released GPT-5.4-Cyber, a variant of GPT-5.4 fine-tuned specifically for defensive cybersecurity use cases. The release accompanies the company's Trusted Access for Cyber program, which allows users to verify their identity via government ID to gain access to cybersecurity-focused tools.

model release

OpenAI releases GPT-5.4-Cyber, a fine-tuned model for vetted cybersecurity defenders with binary reverse engineering cap

OpenAI announced GPT-5.4-Cyber, a variant of GPT-5.4 fine-tuned for defensive cybersecurity work. The model features binary reverse engineering capabilities and reduced safety restrictions, but access is limited to authenticated security professionals through the company's Trusted Access for Cyber program.

model release

OpenAI releases GPT-5.4-Cyber with tiered access verification system for cybersecurity work

OpenAI released GPT-5.4-Cyber, a model variant designed for defensive cybersecurity tasks with fewer restrictions on dual-use queries. Access is controlled through a tiered verification system in the Trusted Access for Cyber program, targeting thousands of vetted users compared to Anthropic's 40-organization Mythos Preview rollout.

Comments

Loading...