computer-vision

8 articles tagged with computer-vision

June 24, 2026
model release

Krea Releases 12-Billion Parameter Text-to-Image Model with 8-Step Generation

Krea.ai released Krea 2 Turbo, a 12-billion parameter diffusion transformer model for text-to-image generation. The open-weight model generates images in 8 inference steps and supports resolutions up to 2048x2048 pixels.

June 22, 2026
model release

Baidu Releases Unlimited-OCR, a 3B Parameter Document Parsing Model Based on Deepseek-OCR

Baidu has released Unlimited-OCR, a 3 billion parameter model for optical character recognition and document parsing. The model supports single-page and multi-page document processing with a 32,768 token context window and runs on NVIDIA GPUs using bfloat16 precision.

June 18, 2026
researchMistral AI

Mistral AI fine-tunes Pixtral-12B on satellite imagery, boosting classification accuracy from 56% to 91%

Mistral AI has published research showing that fine-tuning its Pixtral-12B vision language model on satellite imagery increases classification accuracy from 56% to 91% on the Aerial Image Dataset. Using Low-Rank Adaptation (LoRA) with 8,000 training samples across 30 scene categories, the company reduced hallucinations from 5% to 0.1% for under $10 in compute costs.

June 8, 2026
product updateApple

Apple adds cloud-powered AI image editing to iOS 27 Photos app with Clean Up upgrade, Extend and Reframe tools

Apple announced three AI-powered photo editing tools for iOS 27: an upgraded Clean Up feature for object removal, Extend for generating content around image borders, and Reframe for changing photo angles. The features use a combination of on-device and cloud AI models, marking a shift from Apple's previous on-device-only approach.

June 3, 2026
model release

Ideogram Releases First Open-Weight Image Model With 9.3B Parameters and 2K Native Resolution

Ideogram has released Ideogram 4, a 9.3B parameter open-weight text-to-image model trained from scratch. The model features structured JSON prompting, native 2K resolution output, and ranks as the top open-weight model on Design Arena. Available in fp8 and nf4 quantizations under a non-commercial license.

May 27, 2026
product update

Google adds visual automation triggers to Gemini for Home, lets cameras initiate smart home routines

Google is rolling out camera-based automations for Gemini for Home, allowing smart home cameras to trigger routines based on visual events like package deliveries or glass breaking. The update includes improved multi-request handling and reduced error rates for the AI assistant that replaced Google Assistant on smart home devices.

March 25, 2026
product updateAmazon Web Services

Amazon Bedrock adds three video analysis workflows for multimodal understanding at scale

Amazon Bedrock has introduced three distinct video analysis workflows that leverage multimodal foundation models to extract insights from video content at scale. The approaches—frame-based, shot-based, and multimodal embedding—are designed for different use cases and cost-performance trade-offs, with open-source reference implementations available on GitHub.

February 20, 2026
model release

Segmind releases SegMoE, a mixture-of-experts diffusion model for faster image generation

Segmind has released SegMoE, a mixture-of-experts (MoE) diffusion model designed to accelerate image generation while reducing computational overhead. The model applies MoE techniques traditionally used in large language models to the diffusion model architecture, enabling selective expert activation during inference.