google-deepmind
50 articles tagged with google-deepmind
Apple integrates Google Gemini into Siri, limits availability to select regions
Apple announced Siri AI integration with Google Gemini at its WWDC 2026 event at Apple Park. The update represents Apple's latest AI push, though regional restrictions reportedly limit availability for many users globally.
Google DeepMind releases Gemma 4 12B: encoder-free multimodal model runs on 16GB RAM
Google DeepMind has released Gemma 4 12B, a 12-billion parameter multimodal model that runs locally on laptops with 16GB of RAM. The model eliminates separate vision and audio encoders, processing raw inputs directly through its language model backbone under an Apache 2.0 license.
Google DeepMind Releases Gemma 4: Encoder-Free Multimodal Models from 2.3B to 30.7B Parameters
Google DeepMind released Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 30.7B parameters. The flagship 12B Unified model eliminates separate encoders, processing text, images, audio, and video directly through a single decoder-only transformer with up to 256K token context window.
Google DeepMind releases Gemma 4 12B Unified: encoder-free multimodal model with 256K context window
Google DeepMind has released Gemma 4 12B Unified, an encoder-free multimodal model that processes text, images, and audio through a single decoder-only transformer. The model features 11.95 billion parameters, a 256K token context window, and achieves 77.2% on MMLU Pro and 72.0% on LiveCodeBench v6.
Google's Gemini Spark AI agent uses personal data to plan trips, raising privacy concerns
Google's Gemini Spark, an AI agent rolling out to the company's $99/month AI Ultra plan, demonstrates advanced capabilities by mining users' Gmail, calendar, photos, and location data to create detailed trip itineraries. The agent can perform actions across apps and operate computers, though third-party services like Airbnb currently block its booking attempts.
Google caps single-prompt quota for Gemini 3.1 Pro, makes Flash-Lite free after usage limit complaints
Google has modified Gemini's compute-based usage limits introduced at I/O 2026 after users reported depleting quotas too quickly. The company is now capping how much quota a single Gemini 3.1 Pro prompt can consume and making all 3.1 Flash-Lite prompts free.
Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling
Google has released Gemini Omni, a multimodal AI video generation tool that accepts text, images, audio, and video as inputs. The first tier, Gemini Omni Flash, includes avatar cloning that creates digital versions of users and incorporates physics modeling for realistic motion.
Google triples Gemini usage limits in Antigravity coding tool twice in one week after user complaints
Google has raised Gemini usage limits in its Antigravity coding tool by 3x twice within one week, responding to developers who hit new compute-based quotas within hours. The company also reset weekly quotas for all paid users twice, though limits remain lower than pre-restriction levels.
Google announces Spark AI agent, Information agents, and Android Halo at I/O 2026—all paywalled behind $100/month Ultra
Google announced multiple AI agent products at I/O 2026, including Spark for managing digital tasks, Information agents for 24/7 topic monitoring, and Android Halo for notifications. All features remain paywalled behind the $100/month Gemini Ultra plan, with free access timeline unspecified.
Google cuts AI Ultra plan to $200/month, launches new $100 developer tier
Google announced pricing changes to its Gemini AI subscription tiers at I/O 2026, cutting its top AI Ultra plan from $250 to $200 per month while introducing a new $100/month developer-focused tier. All plans now get access to Gemini 3.5 Flash and the new Gemini Omni video generation model.
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
llm-gemini Plugin Adds Support for Google's Gemini 3.5 Flash Model
Developer Simon Willison released version 0.32 of the llm-gemini plugin, which adds support for Google's Gemini 3.5 Flash model. The plugin enables command-line access to Google's Gemini model family through the LLM tool.
Google DeepMind connects Genie world model to 280 billion Street View images, Waymo already using for self-driving train
Google DeepMind has integrated its Genie world model with Street View's 280 billion images spanning 110 countries, enabling users to explore AI-generated simulations of real locations. Waymo is already using Genie 3 to train self-driving cars on rare scenarios like tornadoes and unexpected obstacles.
Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June
Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.
Google switches Gemini to compute-based limits, cuts AI Ultra to $100/month
Google is replacing Gemini's daily prompt limits with a compute-based system that factors in prompt complexity, features used, and chat length. Limits refresh every five hours until reaching a weekly cap. AI Ultra, aimed at developers and technical leads, now starts at $100/month—down from its previous entry point—with 5x higher usage limits than the Pro plan.
Google DeepMind Integrates Street View With Genie 3 World Model for Real-World Environment Simulation
Google DeepMind launched Street View integration with its Genie 3 world model at I/O 2026, allowing users to simulate real-world locations from 280 billion images across 110 countries. The feature enables environment modification including weather changes and supports robotics training, with initial access for U.S. Ultra subscribers expanding globally.
Google Search adds AI agents, generative UI, and conversational search box powered by Gemini 3.5 Flash
Google announced major Search updates at I/O 2026, including AI Mode now powered by Gemini 3.5 Flash serving over 1 billion monthly users. The company is launching background information agents that monitor the web 24/7 and generate custom mini-apps, both features reserved for Google AI Pro and Ultra subscribers.
Google releases Gemini 3.5 Flash with autonomous coding and agent capabilities, claims 4x speed boost
Google released Gemini 3.5 Flash, positioning it as an agent-first model designed for autonomous coding and multi-hour workflows. The company claims the model outperforms its 3.1 Pro predecessor on coding and agentic benchmarks while running 4x faster than competing frontier models, with an optimized version achieving 12x speed gains.
Google Search deploys Gemini Flash 3.5 for AI-generated interfaces, agent-based information gathering
Google announced a fundamental restructuring of Search, replacing traditional ranked links with AI-generated interfaces powered by Gemini Flash 3.5. The update introduces information agents that monitor the web 24/7 and custom mini-apps built through natural language, with the generative UI rolling out free to all users this summer.
Google releases Gemini 3.5 Flash at half the price of frontier models, announces Omni world model
Google released Gemini 3.5 Flash, priced at half to one-third the cost of comparable frontier models, and announced it will become the default model in the Gemini app globally. The company also unveiled Omni, a world model for simulating physical environments, and Gemini Spark, an AI agent in beta testing.
Google DeepMind launches 'Magic Pointer' AI feature for context-aware interactions across web pages
Google DeepMind has detailed Magic Pointer, an AI feature that interprets visual and semantic context around cursor position to enable natural language interactions. The capability is rolling out to Gemini in Chrome and includes two public demos in AI Studio for image editing and map search.
Google removes alcohol content filter blocking cocktail recipes on Gemini for Google Home
Google has updated Gemini for Home to remove content filters that previously blocked adult users from accessing cocktail recipes. The update also adds faster alarm setting, personalized security camera searches, and thumbs-up/down feedback buttons on smart displays.
Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction
Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.
Google tests Remy AI agent internally, designed to act autonomously across Gemini services
Google is testing Remy, an AI personal agent for Gemini that can take actions on users' behalf across Google services, according to Business Insider. The tool is currently in employee-only testing with no confirmed public release date.
Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters
Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.
Google Opens Gemini Notebooks to Free Users with 50-Source Limit
Google has expanded its Notebooks feature in the Gemini app to free users, allowing them to organize chats and files with up to 50 sources per notebook. The feature, which integrates with NotebookLM, was previously available only to Google AI subscribers.
Google adds Nano Banana image generation to Gemini Personal Intelligence, using Gmail and Photos data
Google has integrated its Nano Banana image generation system with Gemini's Personal Intelligence feature, enabling the AI to create images informed by user data from Gmail, Photos, Calendar, Drive, and other Google apps. The feature rolls out to Plus, Pro, and Ultra subscribers in the US first, with Europe excluded from the initial launch.
Google's Gemini now generates personalized images using your Google Photos library
Google's Gemini can now generate personalized images by pulling data from users' Google Photos libraries through its Personal Intelligence feature. The integration uses Google Photos labels to identify people and objects, then generates images via the Nano Banana 2 model that reflect users' tastes and lifestyle.
Google DeepMind releases Gemini 3.1 Flash TTS with audio tags for precise speech control across 70+ languages
Google DeepMind launched Gemini 3.1 Flash TTS, a text-to-speech model that achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard. The model introduces audio tags that allow developers to control vocal style, pace, and delivery through natural language commands embedded in text input, with support for 70+ languages.
Google Home April 2026 update reduces Gemini interruptions, improves speech recognition in noisy environments
Google Home's April 2026 update addresses Gemini voice assistant reliability issues. The update improves speech detection to reduce mid-sentence interruptions, speeds up responses to simple queries, and enhances music playlist recognition even when names are misspoken or in noisy environments.
Google DeepMind releases Gemma 4 with four model sizes, up to 256K context, multimodal support
Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (2.3B to 31B parameters) with context windows up to 256K tokens. All models support text and image input, with audio native to E2B and E4B variants. The Gemma 4 31B dense model scores 85.2% on MMLU Pro, 89.2% on AIME 2026, and 80.0% on LiveCodeBench—significant improvements over Gemma 3.
Google releases Gemma 4 26B with 256K context and multimodal support, free to use
Google DeepMind has released Gemma 4 26B A4B, a free instruction-tuned Mixture-of-Experts model with 262,144 token context window and multimodal capabilities including text, images, and video input. Despite 25.2B total parameters, only 3.8B activate per token, delivering performance comparable to larger 31B models at reduced compute cost.
Google releases Gemma 4 31B free model with 256K context and multimodal support
Google DeepMind has released Gemma 4 31B Instruct, a free 30.7-billion parameter model with a 256K token context window, multimodal text and image input capabilities, and native function calling. The model supports configurable reasoning mode and 140+ languages, with strong performance on coding and document understanding tasks under Apache 2.0 license.
Google DeepMind releases Gemma 4 family: multimodal models from 2.3B to 31B parameters with 256K context
Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active parameters), and 31B dense. All models support text and image input with 128K-256K context windows; E2B and E4B add native audio capabilities. Models feature reasoning modes, function calling, and multilingual support across 140+ languages.
NVIDIA releases Gemma 4 31B quantized model with 256K context, multimodal capabilities
NVIDIA has released a quantized version of Google DeepMind's Gemma 4 31B IT model, compressed to NVFP4 format for efficient inference on consumer GPUs. The 30.7B-parameter multimodal model supports 256K token context windows, handles text and image inputs with video frame processing, and maintains near-baseline performance across reasoning and coding benchmarks.
Google DeepMind releases Gemma 4 with multimodal reasoning and up to 256K context window
Google DeepMind released Gemma 4, a multimodal model family supporting text, images, video, and audio with context windows up to 256K tokens. The release includes four sizes (E2B, E4B, 26B A4B, and 31B) designed for deployment from mobile devices to servers. The 31B dense model achieves 85.2% on MMLU Pro and 89.2% on AIME 2026.
Gemma 4 success hinges on tooling and fine-tuning ease, not benchmark scores
Google's Gemma 4 release marks a shift in open model strategy with Apache 2.0 licensing and competitive benchmarks, but real success depends on factors rarely measured: tooling stability, fine-tuning ease, and ecosystem adoption. The open model landscape is now crowded with alternatives like Qwen 3.5, Nemotron 3, and others—a maturation that changes what separates winners from the field.
Google DeepMind releases Gemma 4 with four models up to 31B parameters, 256K context window
Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (E2B, E4B, 26B A4B, 31B) with context windows up to 256K tokens and native reasoning capabilities. The 26B A4B variant uses Mixture-of-Experts architecture with 3.8B active parameters for efficient inference. All models support text, image input and handle 140+ languages with Apache 2.0 licensing.
Google DeepMind releases Gemma 4, open multimodal models with 256K context and reasoning
Google DeepMind has released Gemma 4, a family of open-weights multimodal models ranging from 2.3B to 31B parameters with support for text, images, video, and audio. The models feature context windows up to 256K tokens, built-in reasoning modes, and native function calling for agentic workflows.
Google DeepMind releases Gemma 4 open models with up to 256K context and multimodal reasoning
Google DeepMind has released Gemma 4, an open-weights model family in four sizes (2.3B to 31B parameters) with multimodal capabilities handling text, images, video, and audio. The 26B A4B variant uses mixture-of-experts to achieve 4B active parameters while supporting 256K token context windows and native reasoning modes.
Google DeepMind releases Gemma 4 family with 256K context window and multimodal capabilities
Google DeepMind released the Gemma 4 family of open-weights models in four sizes (2.3B to 31B parameters) with multimodal support for text, images, video, and audio. The flagship 31B model achieves 85.2% on MMLU Pro and 89.2% on AIME 2024, with context windows up to 256K tokens. All models feature configurable reasoning modes and are optimized for deployment from mobile devices to servers under Apache 2.0 license.
Google launches Gemma 4 open-weights models with Apache 2.0 license to compete with Chinese LLMs
Google released Gemma 4, a new line of open-weights models available in sizes from 2 billion to 31 billion parameters, under a permissive Apache 2.0 license. The release includes multimodal capabilities, support for 140+ languages, native function calling, and a 256,000-token context window for the larger variants.
Google DeepMind releases Gemma 4 with 4 model sizes, 256K context, and multimodal reasoning
Google DeepMind released Gemma 4, a family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (3.8B active), and 31B (30.7B parameters). All models support text and image input with 128K-256K context windows, while E2B and E4B add native audio capabilities and reasoning modes across 140+ languages.
Google DeepMind releases Gemma 4 open models with multimodal capabilities and 256K context window
Google DeepMind released the Gemma 4 family of open-source models with multimodal capabilities (text, image, audio, video) and context windows up to 256K tokens. Four distinct model sizes—E2B (2.3B effective parameters), E4B (4.5B effective), 26B A4B (3.8B active), and 31B—are available under the Apache 2.0 license, with instruction-tuned and pre-trained variants.
Google DeepMind releases Gemma 4: multimodal models up to 31B parameters with 256K context
Google DeepMind released the Gemma 4 family of open-weights multimodal models in four sizes: E2B (2.3B effective), E4B (4.5B effective), 26B A4B (25.2B total, 3.8B active), and 31B dense. All models support text and image input with 128K-256K context windows, reasoning modes, and native function calling for agentic workflows.
Google releases Gemma 4 31B with 256K context and configurable reasoning mode
Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.
Google releases Gemma 4 family with 31B model, 256K context, multimodal capabilities
Google DeepMind released the Gemma 4 family of open-weights models ranging from 2.3B to 31B parameters, featuring up to 256K token context windows and native support for text, image, video, and audio inputs. The flagship 31B model scores 85.2% on MMLU Pro and 89.2% on AIME 2026, with a smaller 26B MoE variant requiring only 3.8B active parameters for faster inference.
NVIDIA Optimizes Google Gemma 4 for Local Agentic AI on RTX and Spark
NVIDIA has optimized Google's Gemma 4 models for local deployment on RTX and Spark platforms, targeting the emerging wave of on-device agentic AI. The optimization enables small, efficient models to access real-time local context for autonomous decision-making without cloud dependency.
Google DeepMind releases Gemma 4: open models ranking #3 and #6 on Arena AI leaderboard
Google DeepMind released Gemma 4, a family of four open models ranging from 2B to 31B parameters, all licensed under Apache 2.0. The 31B dense model ranks #3 on Arena AI's text leaderboard and the 26B mixture-of-experts variant ranks #6, outperforming closed models significantly larger in size.
Google Deepmind identifies six attack categories that can hijack autonomous AI agents
A Google Deepmind paper introduces the first systematic framework for 'AI agent traps'—attacks that exploit autonomous agents' vulnerabilities to external tools and internet access. The researchers identify six attack categories targeting perception, reasoning, memory, actions, multi-agent networks, and human supervisors, with proof-of-concept demonstrations for each.