model release

Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting

TL;DR

Ideogram has released Ideogram 4, its first open-weight text-to-image model with 9.3 billion parameters. The model supports native 2K resolution, structured JSON prompting with bounding-box layout controls, and is available in nf4 and fp8 quantizations under a non-commercial license.

June 4, 2026 · 5:36 AM3 min read

Ideogram 4 — Quick Specs

Compare Ideogram 4 with other models →

Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting

Ideogram has released Ideogram 4, its first open-weight text-to-image model featuring 9.3 billion parameters, native 2K resolution support, and a structured JSON prompting interface. The model is available in two quantizations: nf4 (CUDA-only) and fp8 (all hardware), both released under the Ideogram 4 Non-Commercial license.

Technical specifications

Ideogram 4 is built on a fully single-stream Diffusion Transformer (DiT) architecture with 34 layers. Unlike fine-tuned models, it was trained from scratch. The model uses Qwen3-VL-8B-Instruct as its text encoder—a vision-language model instead of traditional text-only encoders like CLIP or T5. Text and image tokens are concatenated into a unified sequence processed through the same transformer, enabling deep cross-modal interaction at every layer.

The model supports flexible resolutions from 256 to 2048 pixels (multiples of 16) with aspect ratios up to 6:1, automatically adjusting the noise schedule per resolution. Both quantizations weigh 9.3B parameters, with the nf4 version requiring CUDA and supporting Diffusers integration.

Benchmark performance

According to Ideogram, the model ranks as the top open-weight model on Design Arena, a third-party image Elo leaderboard focused on design-oriented generation. On the overall Design Arena leaderboard, Ideogram 4 trails only proprietary GPT and Gemini models.

In a blind typography evaluation by ContraLabs, where ten professional designers from Contra judged outputs, Ideogram 4 achieved a 47.9% first-place win rate—ahead of Gemini 3.1 Flash Image Preview (30.0%), FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%). The same designers rated it 3.55/5 for practical usability in client work.

On LMArena's general-purpose text-to-image leaderboard, Ideogram ranks as a top-5 lab overall and the highest-ranked open-weight lab. In Ideogram's internal human-preference benchmark focused on graphic design and photography, the model scored second overall behind GPT Image 2 medium.

On standard open-source benchmarks, Ideogram 4 claims best-in-class layout control (7Bench), outperforming all closed-source models tested. For text rendering (X-Omni OCR), it reportedly exceeds larger models including Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE).

Key capabilities

The model introduces structured JSON prompting, allowing explicit control over composition, style, lighting, color palettes, and spatial layout through bounding-box coordinates. It supports multilingual text rendering with what Ideogram claims is state-of-the-art in-image text generation for signage, logos, and multi-line text.

Inference requires accepting a license gate on Hugging Face and authentication via an access token. The model uses dual-branch classifier-free guidance, enabling independent refinement of conditional (positive) and unconditional (negative) branches. Safety screening is performed via Hive's text and visual moderation APIs.

What this means

Ideogram 4 represents a significant open-weight release in the text-to-image space, particularly for design-focused applications. The structured JSON prompting and bounding-box controls address a key limitation in many image models: precise compositional control. At 9.3B parameters, it's considerably smaller than competitors like FLUX.2 [dev] (32B) while claiming superior performance on design-specific benchmarks. However, the non-commercial license limits its use cases compared to fully open models. The choice to use a vision-language model (Qwen3-VL) as the text encoder rather than standard CLIP or T5 is architecturally notable and may explain its strong performance on visual concept understanding and text rendering.

Source: huggingface.co ↗

ideogram-ai text-to-image open-weight diffusion-transformer multimodal benchmarks model-release

model releaseJuly 16, 2026

Moonshot AI releases 2.8T parameter Kimi K3, pricing at $3/$15 per million tokens

Chinese AI lab Moonshot AI released Kimi K3, a 2.8 trillion parameter model priced at $3 per million input tokens and $15 per million output tokens. The model is currently available via API, with open weights promised by July 27, 2026. This represents the most expensive pricing from a Chinese AI lab to date, matching Anthropic's Claude Sonnet series.

model releaseJuly 16, 2026

Moonshot AI Releases Kimi K3: Open-Weight Multimodal Reasoning Model with 1M Context Window

Moonshot AI has released Kimi K3, an open-weight multimodal reasoning model with a 1-million token context window. The model is priced at $3 per 1M input tokens and $15 per 1M output tokens, available through OpenRouter.

model releaseJuly 17, 2026

Moonshot AI releases Kimi K3, China's largest model at 2.8 trillion parameters

Beijing-based Moonshot AI released Kimi K3, China's largest AI model at 2.8 trillion parameters. The company claims the model consistently outperforms OpenAI's GPT 5.5 and Anthropic's Claude Opus 4.8 on benchmarks including coding and general agents, though it still trails the leading-edge GPT 5.6 Sol and Claude Fable 5 in overall performance.

model releaseJuly 17, 2026

Moonshot AI's Kimi k3 claims top performance among Chinese models with 1M token context

Moonshot AI has released Kimi k3, positioning it as China's leading AI model. The company claims the model features a 1 million token context window and improved reasoning capabilities, though independent benchmarks are not yet available.

Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting

Ideogram 4 — Quick Specs

Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting

Technical specifications

Benchmark performance

Key capabilities

What this means

Related Articles

Moonshot AI releases 2.8T parameter Kimi K3, pricing at $3/$15 per million tokens

Moonshot AI Releases Kimi K3: Open-Weight Multimodal Reasoning Model with 1M Context Window

Moonshot AI releases Kimi K3, China's largest model at 2.8 trillion parameters

Moonshot AI's Kimi k3 claims top performance among Chinese models with 1M token context

Comments