LLM News | TPS

benchmark

RoboMME benchmark reveals memory architecture trade-offs in robotic vision-language models

Researchers introduce RoboMME, a large-scale standardized benchmark for evaluating memory in robotic vision-language-action (VLA) models across 16 manipulation tasks. The study tests 14 memory-augmented VLA variants and finds that no single memory architecture excels across all task types—each design offers distinct trade-offs depending on temporal, spatial, object, and procedural demands.

March 6, 2026 · 5:50 AM2 min read

benchmark robotics vision-language-action

via arxiv.org ↗