Google releases Gemma 4, open-source on-device AI with agentic tool use for phones
Google released Gemma 4, an open-source multimodal model that runs entirely on smartphones without sending data to the cloud. The E2B and E4B variants require just 6GB and 8GB of RAM respectively and can autonomously use tools like Wikipedia, maps, and QR code generators through built-in agent skills. The model is available free via the Google AI Edge Gallery app for Android and iOS.
Google releases Gemma 4, open-source on-device AI with agentic tool use for phones
Google released Gemma 4, an open-source multimodal AI model that processes text, images, and audio entirely on-device with autonomous tool-use capabilities. The model family includes four variants optimized for different hardware, with the smallest versions running on smartphones with as little as 6GB RAM.
Model specifications and variants
Gemma 4 ships in four sizes: E2B and E4B for smartphones, plus 26B and 31B models for servers. The "E" designates "effective parameters," referring to parameters active during inference rather than total parameter count.
The E2B variant consumes approximately 1.3GB quantized storage and runs on devices with 6GB RAM, while E4B requires roughly 2.5GB and 8GB RAM respectively. Google optimized both versions with Arm and Qualcomm for current mobile processors. According to Google, Gemma 4 on Android runs up to 4x faster than the previous generation while reducing battery consumption by up to 60 percent. Arm's benchmarks report even larger gains—an average 5.5x speedup on devices with newer Arm chips supporting the SME2 instruction set.
The 26B variant uses a mixture-of-experts architecture with 128 experts, keeping only 3.8 billion parameters active at inference time. The dense 31B model offers a 256,000-token context window.
All models process text, images, and audio across more than 140 languages. The entire Gemma family has achieved over 400 million downloads since initial release, according to Google.
On-device agentic capabilities
Gemma 4's defining feature is autonomous tool use without cloud connectivity. The bundled Google AI Edge Gallery app includes "agent skills"—built-in tools the model can independently invoke: Wikipedia search, interactive maps, auto-generated summaries, flashcard generation, and QR code generation. The model can describe photos, convert spoken input into diagrams and visualizations, and coordinate with other local models for text-to-speech or image generation.
The model automatically infers user intent and activates the appropriate skill. While tool invocation requires internet connectivity, the model itself runs entirely locally, and conversation history never persists.
Developers can create custom skills via GitHub and share them with the community. The app requires Android 12 or iOS 17 and has already reached fourth place among the most-downloaded free productivity apps in the iOS App Store, behind Claude, Gemini, and ChatGPT.
Google's demos show improvements in optical character recognition and time-aware reasoning, capabilities important for calendar, reminder, and alarm functionality.
Licensing and platform strategy
Gemma 4 releases under the commercially friendly Apache 2.0 license. Google built the models on research underlying its proprietary Gemini 3 system but made them freely available as open-source.
The E2B and E4B variants serve as the foundation for Gemini Nano 4, the next generation of Android's system-wide on-device model. Code written for Gemma 4 will work with Gemini Nano 4 upon release on flagship devices later in 2025. Gemini Nano currently runs on over 140 million Android devices, powering features like Smart Replies and audio summaries.
In December 2024, Google previewed a related approach with FunctionGemma, a 270-million-parameter model that translates natural language into structured function calls for phone tasks—toggling flashlights, creating contacts, managing calendars, and opening settings.
What this means
Gemma 4 marks a significant shift in on-device AI strategy. By combining unrestricted open-source licensing with genuine agentic capabilities at scale, Google enables developers to build privacy-preserving applications without cloud dependencies. The 4x speed improvements and 60 percent battery gains make the technology practical for mainstream phones. The model's integration pathway into Gemini Nano signals Google's commitment to making on-device AI standard across Android. For users, this means AI assistance that never transmits conversations to servers—a direct response to privacy concerns and a competitive move against cloud-dependent systems.
Related Articles
Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context
Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.
Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU
Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.
Google launches Gemini 3.5 Flash and new Omni multimodal AI family at I/O 2026
Google launched Gemini 3.5 Flash today as the default model for its Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. The company also introduced Gemini Omni, a new multimodal AI family capable of generating video from text, photos, video, and audio inputs.
Google launches Gemini Omni Flash, multimodal video generation model available to AI Plus subscribers
Google has released Gemini Omni Flash, the first model in its new Gemini Omni family designed to generate video content from text, images, video, and audio inputs. The model is available now to AI Plus subscribers, with free access coming to YouTube Shorts and YouTube Create later this week.
Comments
Loading...