vision-language

3 articles tagged with vision-language

March 25, 2026
model release

AI2 releases MolmoWeb, open web agent matching proprietary systems with 8B parameters

The Allen Institute for AI has released MolmoWeb, a fully open web agent that operates websites using only screenshots without access to source code. The 8B-parameter model achieves 78.2% success on WebVoyager—nearly matching OpenAI's o3 at 79.3%—while being trained on one of the largest public web task datasets ever released.

model release

Reka releases Reka Edge, a 7B multimodal model for efficient image and video understanding

Reka has released Reka Edge, a 7-billion parameter multimodal model designed for efficient image and video understanding. The model features a 16,384 token context window and is priced at $0.20 per million input and output tokens.

March 2, 2026
model release

Alibaba releases Qwen3.5-4B, a 4B multimodal model for vision and text tasks

Alibaba's Qwen team has released Qwen3.5-4B, a 4 billion parameter multimodal model capable of processing both images and text. The model is available on Hugging Face under an Apache 2.0 license, making it freely available for commercial and research use.