text-to-image
5 articles tagged with text-to-image
ByteDance releases Lance, 3B-parameter unified multimodal model handling image and video generation, editing, and unders
ByteDance has released Lance, a 3-billion parameter multimodal model that performs image and video generation, editing, and understanding within a single framework. The model was trained entirely from scratch using 128 A100 GPUs and achieves 84.67% on DPG-Bench and 74% on GenEval, competing with larger models despite its compact size.
Baidu releases ERNIE-Image-Turbo, a distilled text-to-image model generating in 8 inference steps
Baidu has released ERNIE-Image-Turbo, a distilled text-to-image diffusion transformer that generates images in 8 inference steps. The model runs on consumer GPUs with 24GB VRAM and supports resolutions up to 1376×768, with claimed strengths in text rendering and structured generation tasks.
Baidu releases ERNIE-Image, an 8B parameter text-to-image model with strong text rendering capabilities
Baidu has released ERNIE-Image, an 8B parameter text-to-image generation model built on a single-stream Diffusion Transformer architecture. The model is designed for complex instruction following, text rendering, and structured image generation, and can run on consumer GPUs with 24GB VRAM.
Microsoft releases three in-house AI models for speech and images, signaling independence from OpenAI
Microsoft released public preview versions of three proprietary AI models: MAI-Transcribe-1 for speech recognition across 25 languages at 50% lower GPU cost than alternatives, MAI-Voice-1 for speech synthesis generating 60 seconds of audio in under a second, and MAI-Image-2 for text-to-image generation. The models are available exclusively through Microsoft Azure AI Foundry and already power Copilot, Bing, and PowerPoint.
Microsoft's superintelligence team releases MAI-Image-2, ranks third in text-to-image generation
Microsoft's superintelligence team, led by Mustafa Suleyman, has released MAI-Image-2, a text-to-image generator that currently ranks third on the Arena.ai leaderboard for text-to-image models, behind OpenAI's GPT-Image-1.5 and Google's Nano Banana 2. The model is now available for testing in the MAI Playground and will roll out to Copilot and Bing Image Creator, with API access opening to all developers through Microsoft Foundry.