model releaseApple

Apple ships 20-billion-parameter model that runs from iPhone flash storage using expert pruning

TL;DR

Apple detailed its third-generation Foundation Models family: five models including AFM 3 Core Advanced, a 20-billion-parameter on-device model that keeps most parameters in flash storage and loads only 1-4 billion at a time into memory. The models were custom-built with Google and trained on Google's TPUs.

2 min read
0

Apple ships 20-billion-parameter model that runs from iPhone flash storage using expert pruning

Apple released technical details on its third-generation Apple Foundation Models (AFM 3), a family of five models that includes a 20-billion-parameter model running entirely on-device despite being far larger than typical on-device models.

The model lineup

The AFM 3 family consists of:

  • AFM 3 Core: 3-billion-parameter on-device model for everyday tasks
  • AFM 3 Core Advanced: 20-billion-parameter on-device model, Apple's most powerful local model
  • AFM 3 Cloud: Server-based workhorse model
  • ADM 3 Cloud: Image generation model powering Image Playground and Genmoji
  • AFM 3 Cloud Pro: Largest model for agentic tool use and complex reasoning

All models were custom-built in collaboration with Google and trained on Google's TPUs, according to Apple's technical post published alongside WWDC.

Flash-based inference architecture

The engineering focus is AFM 3 Core Advanced. The 20-billion-parameter multimodal model stores its full weight set in flash storage rather than RAM. Using what Apple calls "Instruction-Following Pruning," the model makes routing decisions once per prompt, then loads only 1-4 billion parameters into memory at a time.

A core set of shared expert parameters remains in memory continuously while task-specific experts are swapped in from flash as needed. This lets Apple "scale the model far beyond traditional DRAM limits," the company says. The architecture powers the improved voice synthesis and dictation in iOS 18.

Google partnership details

Apple's relationship with Google is multilayered. The AFM models are Apple's own architecture, trained on Google Cloud TPUs. For the Cloud Pro model handling complex reasoning, Apple reportedly uses a large custom Google model. Apple extended its Private Cloud Compute privacy architecture onto Nvidia GPUs in Google Cloud for these server-based models.

Apple claims Private Cloud Compute keeps user data from being stored or shared, including with Apple itself.

Developer access and model flexibility

Apple introduced a Foundation Models framework that lets developers call the on-device model directly. A new model-abstraction layer allows apps to swap in third-party models like Claude or Gemini without code changes. iOS 27 will let users set rival assistants as system defaults.

Performance claims

Apple's internal evaluations show AFM 3 Cloud preferred over the previous generation on 64.7% of prompts. Expressive voices scored 4.15 versus 3.87 on a 5-point scale. These are Apple's own human evaluations, not independent benchmarks. The models remain in beta, and Apple says a fuller technical report will arrive later this summer.

What this means

Apple's flash-based inference approach addresses a core constraint in on-device AI: models have grown far larger than available RAM. By keeping the bulk of parameters in flash and loading only active experts into memory, Apple can run frontier-scale models locally while maintaining privacy guarantees. The Google partnership reveals Apple's infrastructure dependency for training and high-end inference, even as it builds proprietary model architectures. Whether the performance matches Apple's internal benchmarks remains to be tested independently when the full technical report ships.

Related Articles

product update

Apple deploys 1.2T-parameter Gemini model on Nvidia Blackwell GPUs for rebuilt Siri

Apple announced at WWDC 2026 that the rebuilt Siri runs on a custom 1.2-trillion-parameter model based on Google's Gemini technology, hosted on Google Cloud servers powered by Nvidia Blackwell B200 GPUs. The company unveiled a three-tier privacy architecture and five new Apple Foundation Models to handle queries across device, private cloud, and Google Cloud infrastructure.

product update

Apple announces Siri AI overhaul with Google Gemini-derived model at WWDC 2026

Apple announced a redesigned Siri AI at WWDC 2026, powered by a custom Gemini-derived model from Google that runs on Apple's Private Cloud Compute infrastructure. The company also introduced Core AI, a new library that integrates with PyTorch to enable developers to deploy models on Apple hardware.

product update

Apple waives cloud API fees for developers under 2M downloads using Private Cloud Compute

Apple announced it will waive cloud API fees for developers with fewer than 2 million first-time App Store downloads who use its Foundation Models running in Private Cloud Compute. The company also expanded its Foundation Models framework to include image input and support for server models from third-party cloud providers.

product update

Apple announces Siri AI powered by Google Gemini models at WWDC 2026

Apple announced Siri AI at WWDC 2026, revealing a "deep collaboration with Google" that leverages Gemini models for its next-generation Apple Intelligence features. The new Siri includes personal context understanding, app actions, on-screen awareness, and conversational capabilities previously absent from the original Siri.

Comments

Loading...