Liquid AI Releases LFM2.5-8B: 8-Billion Parameter Hybrid Model Optimized for Edge Deployment

TL;DR

Liquid AI has released LFM2.5-8B-A1B, an 8-billion parameter hybrid model designed specifically for edge AI and on-device deployment. The model is available in multiple GGUF quantized formats ranging from 4-bit (4.84 GB) to 16-bit (16.9 GB), optimized for memory efficiency.

May 29, 2026 · 4:21 AM2 min read

Liquid AI Releases LFM2.5-8B: 8-Billion Parameter Hybrid Model Optimized for Edge Deployment

Liquid AI has released LFM2.5-8B-A1B, an 8-billion parameter hybrid model designed specifically for edge AI and on-device deployment. The company claims the model "sets a new standard in terms of quality, speed, and memory efficiency."

Technical Specifications

The model features an LFM2MoE architecture and is available in multiple GGUF quantized formats:

4-bit Q4_0: 4.84 GB
4-bit Q4_K_M: 5.16 GB
5-bit Q5_K_M: 6.03 GB
6-bit Q6_K: 6.96 GB
8-bit Q8_0: 9.01 GB
16-bit BF16/F16: 16.9 GB

The quantization range allows developers to balance model size against performance based on deployment constraints. The smallest 4-bit version requires under 5 GB of storage, making it viable for mobile and edge devices.

Deployment and Availability

The model runs via llama.cpp, the widely-used inference framework for quantized language models. According to Hugging Face data, the model has been downloaded 42 times in its first month of availability.

LFM2.5-8B-A1B is a fine-tuned version built on top of LiquidAI/LFM2.5-8B-A1B-Base. The model is part of a collection of 33 post-trained and base LFM2.5 models released by Liquid AI.

No pricing information has been disclosed for API access. The model is not currently deployed by any inference provider on Hugging Face's platform.

Architecture Details

The LFM2MoE architecture suggests a mixture-of-experts approach, though specific architectural details beyond the 8-billion parameter count have not been published. The "2" in LFM2 indicates this is the second generation of Liquid AI's hybrid models.

What This Means

Liquid AI is targeting the growing edge AI market with a model sized between typical small models (1-3B parameters) and larger general-purpose models (70B+). The 8B parameter count and aggressive quantization options suggest the company is prioritizing deployment flexibility over raw capability. However, without published benchmark scores or detailed performance comparisons, it's unclear how LFM2.5-8B compares to established edge-optimized models like Llama 3.2 3B or Phi-3.5-mini. The availability of GGUF formats makes the model immediately compatible with the broader llama.cpp ecosystem.

Source: huggingface.co ↗

Liquid AI LFM2.5 edge AI on-device GGUF quantization hybrid model llama.cpp

model releaseJuly 14, 2026

PrismML releases Bonsai 27B, claims first 27B-parameter model to run on-device on iPhone at 4GB memory footprint

PrismML has released Bonsai 27B, claiming it's the first 27-billion parameter model capable of running on-device on iPhone. The model achieves 58-87 tokens per second on Apple's M5 Max chip with a 4GB memory footprint, using 1-bit and ternary quantization to fit within iPhone's approximately 6GB available app memory.

model releaseJuly 14, 2026

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Google has released Gemma 4 E2B for TPU, a variant of its open-source Gemma 4 model optimized to run natively on the Tensor G5 chip in Pixel 10 devices. The multimodal model enables completely offline AI chat, image recognition, and audio transcription on Pixel 10, 10 Pro, 10 Pro XL, and 10 Pro Fold.

model releaseJuly 14, 2026

Kwaipilot Releases KAT-Coder-Air V2.5 with 256K Context Window at $0.15/$0.60 Per Million Tokens

Kwaipilot has released KAT-Coder-Air V2.5, a coding-specialized model with a 256K token context window. The model is priced at $0.15 per million input tokens and $0.60 per million output tokens, positioning it as a mid-tier coding assistant option.

model releaseJuly 14, 2026

Kwaipilot Releases KAT-Coder-Pro V2.5 with 256K Context Window at $0.74/$2.96 Per Million Tokens

Kwaipilot has released KAT-Coder-Pro V2.5, a coding-focused language model with a 256,000-token context window. The model is priced at $0.74 per million input tokens and $2.96 per million output tokens, available through OpenRouter.

Liquid AI Releases LFM2.5-8B: 8-Billion Parameter Hybrid Model Optimized for Edge Deployment

Liquid AI Releases LFM2.5-8B: 8-Billion Parameter Hybrid Model Optimized for Edge Deployment

Technical Specifications

Deployment and Availability

Architecture Details

What This Means

Related Articles

PrismML releases Bonsai 27B, claims first 27B-parameter model to run on-device on iPhone at 4GB memory footprint

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Kwaipilot Releases KAT-Coder-Air V2.5 with 256K Context Window at $0.15/$0.60 Per Million Tokens

Kwaipilot Releases KAT-Coder-Pro V2.5 with 256K Context Window at $0.74/$2.96 Per Million Tokens

Comments