model release

Ideogram Releases First Open-Weight Image Model With 9.3B Parameters and 2K Native Resolution

TL;DR

Ideogram has released Ideogram 4, a 9.3B parameter open-weight text-to-image model trained from scratch. The model features structured JSON prompting, native 2K resolution output, and ranks as the top open-weight model on Design Arena. Available in fp8 and nf4 quantizations under a non-commercial license.

2 min read
0

Ideogram Releases First Open-Weight Image Model With 9.3B Parameters and 2K Native Resolution

Ideogram has released Ideogram 4, a 9.3B parameter open-weight text-to-image model trained entirely from scratch. The model is available in two quantizations: nf4 (CUDA-only, Diffusers-compatible) and fp8 (cross-platform), both under the Ideogram 4 Non-Commercial license.

Architecture and Technical Specifications

Ideogram 4 uses a fully single-stream Diffusion Transformer (DiT) architecture with 34 layers. Unlike traditional text-to-image models, it concatenates text and image tokens into a unified sequence processed through the same transformer, enabling cross-modal interaction at every layer.

The model uses Qwen3-VL-8B-Instruct as its text encoder instead of CLIP or T5. Hidden states are extracted from 13 intermediate layers and concatenated, providing multi-scale semantic features. The model supports resolutions from 256px to 2048px (in multiples of 16) with aspect ratios up to 6:1.

Benchmark Performance

According to Ideogram, the model ranks first among open-weight models on Design Arena, a third-party image generation leaderboard focused on design tasks. On the overall Design Arena board, Ideogram 4 trails only proprietary models from OpenAI (GPT Image) and Google (Gemini).

In ContraLabs' blind typography evaluation with ten professional designers, Ideogram 4 was selected as best 47.9% of the time, ahead of Gemini 3.1 Flash Image Preview (30.0%), FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%). The same designers rated it 3.55/5 for real client work usability, higher than competing models.

On standard open-source benchmarks, Ideogram claims the model leads all tested models on layout control (7Bench) and delivers better text rendering than larger open-weight alternatives including Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE).

Key Features

The model introduces structured JSON prompting, allowing explicit control over composition, style, lighting, color palette, typography, and spatial layout through bounding-box coordinates. It supports multilingual text rendering and can generate images at native 2K resolution without upscaling.

Inference code is available on GitHub, with model weights hosted on Hugging Face behind a license gate. The model requires authentication via Hugging Face tokens and optionally integrates with Ideogram's hosted "magic prompt" API for prompt expansion and Hive for safety screening.

What This Means

Ideogram 4 represents a significant release in open-weight text-to-image models, particularly for design-focused applications. The 9.3B parameter count makes it substantially smaller than competing open models like FLUX.2 [dev] (32B) while claiming superior performance on design and typography benchmarks. However, the non-commercial license limits production use cases. The structured JSON prompting interface and native high-resolution support address key limitations of previous open-weight image models, though real-world performance will depend on community validation beyond company-provided benchmarks.

Related Articles

model release

NVIDIA Releases Cosmos3-Super-Text2Image: 64B Parameter Model for Physical AI Applications

NVIDIA released Cosmos3-Super-Text2Image, a 64-billion parameter text-to-image generation model as part of its Cosmos3 collection of omnimodal world models. The model uses a Mixture-of-Transformers architecture combining autoregressive and diffusion transformers, designed for Physical AI applications including robotics and autonomous vehicles.

model release

Google DeepMind Releases Gemma 4: Encoder-Free Multimodal Models from 2.3B to 30.7B Parameters

Google DeepMind released Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 30.7B parameters. The flagship 12B Unified model eliminates separate encoders, processing text, images, audio, and video directly through a single decoder-only transformer with up to 256K token context window.

model release

Google DeepMind releases Gemma 4 12B Unified: encoder-free multimodal model with 256K context window

Google DeepMind has released Gemma 4 12B Unified, an encoder-free multimodal model that processes text, images, and audio through a single decoder-only transformer. The model features 11.95 billion parameters, a 256K token context window, and achieves 77.2% on MMLU Pro and 72.0% on LiveCodeBench v6.

model release

Alibaba's Qwen Releases Qwen3.7 Plus: 1M Context Window at $0.40 Per Million Input Tokens

Alibaba's Qwen has released Qwen3.7 Plus, a multimodal model with a 1 million token context window. The model accepts text and image input with text output, priced at $0.40 per million input tokens and $1.60 per million output tokens through OpenRouter's API.

Comments

Loading...