vision-language-action
2 articles tagged with vision-language-action
NVIDIA Releases GR00T N1.7, 3B-Parameter Open-Source Humanoid Robot Model Trained on 20,854 Hours of Human Video
NVIDIA released GR00T N1.7, a 3-billion parameter open-source Vision-Language-Action model for humanoid robots with commercial licensing. The model was trained on 20,854 hours of human egocentric video data and demonstrates the first documented scaling law for robot dexterity, where increasing human video data from 1,000 to 20,000 hours more than doubles task completion rates.
Tencent releases HY-Embodied-0.5, a 2B-parameter vision-language model for robot control
Tencent has released HY-Embodied-0.5, a family of foundation models designed specifically for embodied AI and robotic control. The suite includes a 2B-parameter MoT (Mixture-of-Transformers) variant with only 2.2B activated parameters during inference, and a 32B model that claims frontier-level performance comparable to Gemini 3.0 Pro, trained on over 200 billion tokens of embodied-specific data.