Microsoft Releases Lens-Turbo: 3.8B-Parameter Text-to-Image Model Trained on 800M GPT-4.1-Captioned Images

TL;DR

Microsoft has released Lens-Turbo, a 3.8B-parameter foundational text-to-image model designed for efficient training and fast generation. The model was trained on Lens-800M, an 800 million image-text corpus with GPT-4.1 captions, and supports resolutions up to 1440×1440 with 4-step distilled inference.

May 25, 2026 · 5:51 AM2 min read

Microsoft Releases Lens-Turbo: 3.8B-Parameter Text-to-Image Model

Microsoft has released Lens-Turbo, a 3.8 billion-parameter text-to-image model trained on 800 million GPT-4.1-captioned images. The model uses a 48-block MMDiT (multi-modal diffusion transformer) architecture and supports generation at resolutions up to 1440×1440 pixels.

Technical Architecture

Lens combines several technical approaches:

Training corpus: Lens-800M dataset containing 800 million image-text pairs with long-form GPT-4.1 captions
Architecture: 48-block MMDiT denoiser with 3.8B parameters
Latent encoding: Uses FLUX.2 semantic VAE for image encoding
Text encoding: Concatenated multi-layer GPT-OSS features for prompt following and multilingual support
Resolution handling: Mixed-resolution training enables aspect ratios from 1:2 to 2:1

Inference Speed

The distilled Lens-Turbo variant supports 4-step generation, according to Microsoft. The base model went through reinforcement learning post-training for improved visual quality and artifact suppression before distillation.

Resolution and Aspect Ratio Support

The model supports flexible output resolutions:

Maximum resolution: 1440×1440 pixels
Aspect ratio range: 1:2 to 2:1
Multiple resolution presets: 1248×1664, 1664×1248, and square formats

Microsoft states the mixed-resolution training approach enables inference across different aspect ratios without quality degradation.

Training Efficiency Claims

Microsoft claims Lens reaches "competitive quality with substantially less training compute than larger T2I models" through dense-caption pre-training that maximizes information density per training batch. The company has not disclosed specific benchmark scores, training compute requirements, or comparisons to specific competing models.

Model Availability

The model is available on Hugging Face under the repository microsoft/Lens-Turbo. Microsoft has released minimal inference code for generating images from Lens DiT checkpoints. Pricing information for API access has not been disclosed.

What This Means

Lens-Turbo represents Microsoft's entry into the sub-4B parameter text-to-image model category, emphasizing training efficiency through high-quality captions rather than dataset scale. The 4-step distilled inference and flexible resolution support position it for applications requiring fast generation across varied aspect ratios. The reliance on GPT-4.1 for caption generation suggests Microsoft is leveraging its existing LLM infrastructure to improve training data quality, though the actual performance relative to models like Stable Diffusion 3 or FLUX.1 remains unverified without published benchmarks.

Source: huggingface.co ↗

Microsoft Text-to-Image Diffusion Models Computer Vision Model Release Lens FLUX GPT-4

model releaseJuly 8, 2026

SpaceXAI launches Grok 4.5 at $2/$6 per million tokens, targets coding and enterprise work

Elon Musk's SpaceXAI has released Grok 4.5, priced at $2 per million input tokens and $6 per million output tokens. The model, trained alongside recently-acquired Cursor, is positioned as a coding and enterprise tool that claims to outperform Claude Opus 4.8 on several benchmarks while undercutting it on price by 60-76%.

model releaseJuly 8, 2026

OpenAI's GPT-5.6 models Sol, Terra, and Luna launching July 9 after government review delay

OpenAI will release its GPT-5.6 model family on July 9, 2026, following a delay for U.S. government review. The release includes three capability tiers: Sol (flagship), Terra (balanced), and Luna (fast and affordable).

model releaseJuly 8, 2026

OpenAI releases GPT-5.6 Sol, Terra, Luna models publicly after two-week government review

OpenAI will publicly release its GPT-5.6 Sol, Terra and Luna models on Thursday, two weeks after restricting access to select partners at U.S. government request. The release follows Anthropic's restoration of its Claude Fable 5 and Mythos 5 models after a similar government review process.

model releaseJuly 8, 2026

OpenAI releases GPT-5.6 with three variants after government security review

OpenAI is releasing GPT-5.6 to the public on July 9 following government security review under a Trump administration AI cybersecurity order. The release includes three variants: Sol (strongest), Terra (GPT-5.5 performance at half the cost), and Luna (lowest cost option).

Microsoft Releases Lens-Turbo: 3.8B-Parameter Text-to-Image Model Trained on 800M GPT-4.1-Captioned Images

Microsoft Releases Lens-Turbo: 3.8B-Parameter Text-to-Image Model

Technical Architecture

Inference Speed

Resolution and Aspect Ratio Support

Training Efficiency Claims

Model Availability

What This Means

Related Articles

SpaceXAI launches Grok 4.5 at $2/$6 per million tokens, targets coding and enterprise work

OpenAI's GPT-5.6 models Sol, Terra, and Luna launching July 9 after government review delay

OpenAI releases GPT-5.6 Sol, Terra, Luna models publicly after two-week government review

OpenAI releases GPT-5.6 with three variants after government security review

Comments