LLM News

Every LLM release, update, and milestone.

Filtered by:computer-vision✕ clear
research

AlignVAR improves image super-resolution with visual autoregression, 10x faster than diffusion models

Researchers propose AlignVAR, a visual autoregressive framework for image super-resolution that addresses critical consistency problems in existing VAR models. The approach combines spatial consistency autoregression and hierarchical consistency constraints to achieve 10x faster inference with 50% fewer parameters than leading diffusion-based methods.

research

Pointer-CAD unifies B-Rep and command sequences for LLM-based CAD generation

Researchers present Pointer-CAD, an LLM-based framework that addresses fundamental limitations in command sequence-based CAD generation by enabling explicit geometric entity selection through pointer mechanisms. The approach reduces quantization errors and supports complex operations like chamfering and filleting that prior methods cannot handle.

2 min readvia arxiv.org
research

Researchers extend Vision Mamba sequence length 4x with separator-based pretraining

Researchers have introduced STAR (Separators for AutoRegressive pretraining), a method that extends Vision Mamba's input sequence length by 4x through strategic separator insertion between images. The STAR-B model achieved 83.5% accuracy on ImageNet-1k, demonstrating improved long-range dependency modeling in vision tasks.

research

VideoTemp-o3 combines temporal grounding with video QA in single agentic framework

Researchers have introduced VideoTemp-o3, a unified framework that addresses limitations in long-video understanding by combining temporal grounding and question-answering in a single agentic system. The approach uses a unified masking mechanism during training and reinforcement learning with dedicated reward signals to improve video segment localization and reduce hallucinations.

model release

Segmind releases SegMoE, a mixture-of-experts diffusion model for faster image generation

Segmind has released SegMoE, a mixture-of-experts (MoE) diffusion model designed to accelerate image generation while reducing computational overhead. The model applies MoE techniques traditionally used in large language models to the diffusion model architecture, enabling selective expert activation during inference.