model releaseApple

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

TL;DR

Apple announced five third-generation foundation models at WWDC26, headlined by AFM 3 Core Advanced—a 20-billion-parameter sparse model that runs on-device by activating only 1-4 billion parameters at a time. For the first time, Apple extended Private Cloud Compute to third-party infrastructure, with AFM 3 Cloud Pro running on Nvidia GPUs in Google Cloud.

June 12, 2026 · 2:36 AM3 min read

AFM 3 Core Advanced — Quick Specs

Compare AFM 3 Core Advanced with other models →

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

Apple announced five third-generation Apple Foundation Models (AFM) at WWDC26, including the first Apple AI model to run on third-party cloud infrastructure. The lineup spans on-device models and server-based systems, with AFM 3 Cloud Pro running on Nvidia GPUs hosted in Google Cloud.

The five models

According to Apple, the AFM 3 lineup comprises:

AFM 3 Core: 3-billion-parameter dense model for on-device processing
AFM 3 Core Advanced: 20-billion-parameter sparse model that activates 1-4 billion parameters per request, runs on-device on capable Apple silicon
AFM 3 Cloud: Server-based model optimized for speed and efficiency on Apple silicon servers
ADM 3 Cloud (Image): Diffusion-based image generation model running on Apple silicon
AFM 3 Cloud Pro: Most capable server model for complex reasoning and agentic tool use, runs on Nvidia GPUs in Google Cloud

Sparse architecture enables 20B parameters on-device

AFM 3 Core Advanced uses a sparse architecture based on Apple's "Instruction-Following Pruning" research from 2025. Unlike dense models that activate all parameters for every request, the sparse design selectively activates 1-4 billion of its 20 billion parameters depending on the prompt. Apple claims this approach differs from standard Mixture of Experts architectures.

The model is natively multimodal, handling audio and images alongside text. Apple restricts it to "our most capable Apple silicon systems," though specific device requirements were not disclosed.

Private Cloud Compute expands to Google infrastructure

AFM 3 Cloud Pro marks the first time Apple extended Private Cloud Compute beyond its own data centers. The model runs on Nvidia GPUs in Google Cloud while maintaining what Apple describes as "powerful security and privacy protections."

According to Apple's Security blog, the implementation includes:

Cryptographically verifiable, append-only ledger of all Google Cloud hardware in the Private Cloud Compute fleet
Software attestation rooted in at least two independent vendor roots of trust
Dedicated process isolation for initial network data parsing
Short time-to-live duration for shared inference software
Separate confidential VM for attested keys

Apple states it does not rely solely on confidential computing but treats "every component—from firmware through the host and guest OS stacks to application code—to be part of our trusted computing base."

Training and evaluation

Apple trained all five models starting from a common foundation before specializing for respective use cases. Training data included publicly available information, licensed third-party data, open-sourced data, dedicated studies, and synthetic data. The company states no user data or interactions were used in training, and web publishers can opt out.

Apple conducted human evaluations comparing AFM 3 models against previous generations across instruction following, truthfulness, presentation, and image understanding. The company published preference rates across different locale groups but did not release standard benchmark scores like MMLU or HumanEval.

What this means

Apple's deployment of a flagship model on third-party cloud infrastructure represents a significant shift from its historically closed approach, driven by the computational demands of frontier AI. The 20-billion-parameter on-device model with sparse activation is among the largest models designed for consumer devices, though performance benchmarks against competing on-device models remain undisclosed. The expansion of Private Cloud Compute to Google Cloud sets a precedent for privacy-preserving AI deployment across multiple cloud providers, though independent verification of these security guarantees will be critical.

Source: 9to5mac.com ↗

Apple AFM 3 on-device AI Private Cloud Compute Google Cloud Nvidia sparse models multimodal

model releaseJuly 25, 2026

Microsoft Releases Fara1.5-27B, a 27B Vision-Only Web Browsing Agent with 262K Context

Microsoft Research AI Frontiers has released Fara1.5-27B, a 27-billion-parameter multimodal agent that completes web tasks by reading screenshots and emitting click/type/scroll commands. The model, fine-tuned from Qwen3.5-27B, ships under MIT license with a 262K-token context window and is designed to run alongside Microsoft's MagenticLite sandbox.

model releaseJuly 22, 2026

Google Launches Gemini Nano 4 and Gemini Intelligence on Samsung's Galaxy Z Fold 8, Flip 8

Samsung's Galaxy Z Fold 8, Fold 8 Ultra, and Flip 8 are the first devices to ship with Google's Gemini Nano 4 on-device model and the new Gemini Intelligence feature tier. The launch comes with strict hardware requirements including 12GB+ RAM and qualified system-on-chips.

model releaseJuly 26, 2026

Microsoft Releases Mage-Flow: Compact 4B Image Generation and Editing Models Matching Systems 5-8x Larger

Microsoft has released Mage-Flow, a family of 4B-parameter image generation and editing models built on a shared tokenizer-transformer stack. According to Microsoft, the Turbo variants match or beat open-source systems with 5-8x more parameters while running in 4 diffusion steps.

model releaseJuly 25, 2026

Anthropic's Claude Opus 5 Hits 0% Prompt Injection Success Rate in Browser Agent Tests, With Defenses Enabled

Anthropic's system card for Claude Opus 5 reports a 0% prompt injection success rate across 129 browser agent test scenarios when Auto Mode is enabled. On Gray Swan's broader indirect prompt injection benchmark, Opus 5 posted a 2.0% attacker success rate after 15 attempts, the lowest among tested frontier models.

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

AFM 3 Core Advanced — Quick Specs

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

The five models

Sparse architecture enables 20B parameters on-device

Private Cloud Compute expands to Google infrastructure

Training and evaluation

What this means

Related Articles

Microsoft Releases Fara1.5-27B, a 27B Vision-Only Web Browsing Agent with 262K Context

Google Launches Gemini Nano 4 and Gemini Intelligence on Samsung's Galaxy Z Fold 8, Flip 8

Microsoft Releases Mage-Flow: Compact 4B Image Generation and Editing Models Matching Systems 5-8x Larger

Anthropic's Claude Opus 5 Hits 0% Prompt Injection Success Rate in Browser Agent Tests, With Defenses Enabled

Comments