Gemma 4 E2B VLA

Name: Gemma 4 E2B VLA
Author: Google DeepMind

Google DeepMind🇺🇸 United States

active

Compare with other models →

Context window2K tokens

Version History

E2BsnapshotApril 22, 2026

Gemma 4 E2B demonstrated running as a vision-language agent on NVIDIA Jetson Orin Nano Super (8GB), autonomously deciding when to access webcam based on conversational context with no hardcoded triggers.

Coverage

model release

Gemma 4 VLA runs locally on NVIDIA Jetson Orin Nano Super with 8GB RAM, autonomous webcam tool-calling

NVIDIA engineer Asier Arranz demonstrated Gemma 4 running as a vision-language agent (VLA) on a Jetson Orin Nano Super with 8GB RAM. The model autonomously decides when to access a webcam based on user queries, with no hardcoded triggers—performing speech-to-text, vision analysis, and text-to-speech entirely locally.

April 22, 2026 · 3:51 PM3 min read

Gemma NVIDIA Jetson