model releaseApple

Apple ships 20-billion-parameter model that runs from iPhone flash storage using expert pruning

TL;DR

Apple detailed its third-generation Foundation Models family: five models including AFM 3 Core Advanced, a 20-billion-parameter on-device model that keeps most parameters in flash storage and loads only 1-4 billion at a time into memory. The models were custom-built with Google and trained on Google's TPUs.

June 9, 2026 · 1:21 PM2 min read

Apple Foundation Model 3 Core Advanced — Quick Specs

Compare Apple Foundation Model 3 Core Advanced with other models →

Apple ships 20-billion-parameter model that runs from iPhone flash storage using expert pruning

Apple released technical details on its third-generation Apple Foundation Models (AFM 3), a family of five models that includes a 20-billion-parameter model running entirely on-device despite being far larger than typical on-device models.

The model lineup

The AFM 3 family consists of:

AFM 3 Core: 3-billion-parameter on-device model for everyday tasks
AFM 3 Core Advanced: 20-billion-parameter on-device model, Apple's most powerful local model
AFM 3 Cloud: Server-based workhorse model
ADM 3 Cloud: Image generation model powering Image Playground and Genmoji
AFM 3 Cloud Pro: Largest model for agentic tool use and complex reasoning

All models were custom-built in collaboration with Google and trained on Google's TPUs, according to Apple's technical post published alongside WWDC.

Flash-based inference architecture

The engineering focus is AFM 3 Core Advanced. The 20-billion-parameter multimodal model stores its full weight set in flash storage rather than RAM. Using what Apple calls "Instruction-Following Pruning," the model makes routing decisions once per prompt, then loads only 1-4 billion parameters into memory at a time.

A core set of shared expert parameters remains in memory continuously while task-specific experts are swapped in from flash as needed. This lets Apple "scale the model far beyond traditional DRAM limits," the company says. The architecture powers the improved voice synthesis and dictation in iOS 18.

Google partnership details

Apple's relationship with Google is multilayered. The AFM models are Apple's own architecture, trained on Google Cloud TPUs. For the Cloud Pro model handling complex reasoning, Apple reportedly uses a large custom Google model. Apple extended its Private Cloud Compute privacy architecture onto Nvidia GPUs in Google Cloud for these server-based models.

Apple claims Private Cloud Compute keeps user data from being stored or shared, including with Apple itself.

Developer access and model flexibility

Apple introduced a Foundation Models framework that lets developers call the on-device model directly. A new model-abstraction layer allows apps to swap in third-party models like Claude or Gemini without code changes. iOS 27 will let users set rival assistants as system defaults.

Performance claims

Apple's internal evaluations show AFM 3 Cloud preferred over the previous generation on 64.7% of prompts. Expressive voices scored 4.15 versus 3.87 on a 5-point scale. These are Apple's own human evaluations, not independent benchmarks. The models remain in beta, and Apple says a fuller technical report will arrive later this summer.

What this means

Apple's flash-based inference approach addresses a core constraint in on-device AI: models have grown far larger than available RAM. By keeping the bulk of parameters in flash and loading only active experts into memory, Apple can run frontier-scale models locally while maintaining privacy guarantees. The Google partnership reveals Apple's infrastructure dependency for training and high-end inference, even as it builds proprietary model architectures. Whether the performance matches Apple's internal benchmarks remains to be tested independently when the full technical report ships.

Source: thenextweb.com ↗

apple on-device-ai model-architecture google-partnership afm-3 ios-18 flash-inference

model releaseJuly 23, 2026

Poolside Releases Laguna S 2.1, an 8B-Active-Parameter Open Coding Model That Rivals Systems 20x Its Size

Poolside has released Laguna S 2.1, a mixture-of-experts coding model with 8 billion active parameters out of 118 billion total, its third coding model release in three months. The company claims it outperforms open-weight models 10 to 20 times its size on agentic coding benchmarks like Terminal-Bench 2.1 and DeepSWE.

model releaseJuly 22, 2026

Microsoft Releases Mage-Flow, a 4B Open-Weight Model That Matches 20B+ Rivals on Image Generation and Editing

Microsoft has released Mage-Flow, a 4B-parameter open-weight foundation model for text-to-image generation and instruction-based editing. The company claims it matches or beats much larger open systems like Qwen-Image (20B) and FLUX.2 (32B) while running faster and using less memory.

model releaseJuly 21, 2026

Alibaba Releases Qwen-Image-3.0, an Image Generator That Renders 10-Pixel Text and 3x3 Infographic Grids in One Pass

Alibaba's Qwen team has released Qwen-Image-3.0, an image generator that accepts prompts up to 4,500 tokens and can render legible text as small as ten pixels, complex LaTeX formulas, and twelve languages in a single pass. The model is currently invite-only via API, and unlike its predecessor, it likely won't ship with open weights.

model releaseJuly 20, 2026

Alibaba releases Qwen 3.8, a 2.4 trillion parameter open-weight model claiming second place behind Fable 5

Alibaba has released Qwen 3.8, a 2.4 trillion parameter open-weight model that the company claims trails only Fable 5. The multimodal model processes images, videos, and documents, with a preview available through Alibaba's platforms at 10 percent of standard pricing.

Apple ships 20-billion-parameter model that runs from iPhone flash storage using expert pruning

Apple Foundation Model 3 Core Advanced — Quick Specs

Apple ships 20-billion-parameter model that runs from iPhone flash storage using expert pruning

The model lineup

Flash-based inference architecture

Google partnership details

Developer access and model flexibility

Performance claims

What this means

Related Articles

Poolside Releases Laguna S 2.1, an 8B-Active-Parameter Open Coding Model That Rivals Systems 20x Its Size

Microsoft Releases Mage-Flow, a 4B Open-Weight Model That Matches 20B+ Rivals on Image Generation and Editing

Alibaba Releases Qwen-Image-3.0, an Image Generator That Renders 10-Pixel Text and 3x3 Infographic Grids in One Pass

Alibaba releases Qwen 3.8, a 2.4 trillion parameter open-weight model claiming second place behind Fable 5

Comments