Liquid AI Releases LFM2.5-8B: 8-Billion Parameter Hybrid Model Optimized for Edge Deployment
Liquid AI has released LFM2.5-8B-A1B, an 8-billion parameter hybrid model designed specifically for edge AI and on-device deployment. The model is available in multiple GGUF quantized formats ranging from 4-bit (4.84 GB) to 16-bit (16.9 GB), optimized for memory efficiency.
Liquid AI Releases LFM2.5-8B: 8-Billion Parameter Hybrid Model Optimized for Edge Deployment
Liquid AI has released LFM2.5-8B-A1B, an 8-billion parameter hybrid model designed specifically for edge AI and on-device deployment. The company claims the model "sets a new standard in terms of quality, speed, and memory efficiency."
Technical Specifications
The model features an LFM2MoE architecture and is available in multiple GGUF quantized formats:
- 4-bit Q4_0: 4.84 GB
- 4-bit Q4_K_M: 5.16 GB
- 5-bit Q5_K_M: 6.03 GB
- 6-bit Q6_K: 6.96 GB
- 8-bit Q8_0: 9.01 GB
- 16-bit BF16/F16: 16.9 GB
The quantization range allows developers to balance model size against performance based on deployment constraints. The smallest 4-bit version requires under 5 GB of storage, making it viable for mobile and edge devices.
Deployment and Availability
The model runs via llama.cpp, the widely-used inference framework for quantized language models. According to Hugging Face data, the model has been downloaded 42 times in its first month of availability.
LFM2.5-8B-A1B is a fine-tuned version built on top of LiquidAI/LFM2.5-8B-A1B-Base. The model is part of a collection of 33 post-trained and base LFM2.5 models released by Liquid AI.
No pricing information has been disclosed for API access. The model is not currently deployed by any inference provider on Hugging Face's platform.
Architecture Details
The LFM2MoE architecture suggests a mixture-of-experts approach, though specific architectural details beyond the 8-billion parameter count have not been published. The "2" in LFM2 indicates this is the second generation of Liquid AI's hybrid models.
What This Means
Liquid AI is targeting the growing edge AI market with a model sized between typical small models (1-3B parameters) and larger general-purpose models (70B+). The 8B parameter count and aggressive quantization options suggest the company is prioritizing deployment flexibility over raw capability. However, without published benchmark scores or detailed performance comparisons, it's unclear how LFM2.5-8B compares to established edge-optimized models like Llama 3.2 3B or Phi-3.5-mini. The availability of GGUF formats makes the model immediately compatible with the broader llama.cpp ecosystem.
Related Articles
StepFun releases Step-3.7-Flash: 198B-parameter MoE model with 256K context at $0.20/M input tokens
StepFun has released Step-3.7-Flash, a 198B-parameter sparse Mixture-of-Experts vision-language model that activates 11B parameters per token and delivers up to 400 tokens per second. The model supports a 256K context window, three selectable reasoning levels, and is priced at $0.20 per million input tokens (cache miss) and $1.15 per million output tokens.
StepFun launches Step 3.7 Flash: 196B MoE model with 256K context and adjustable reasoning levels at $0.20/$1.15 per 1M
StepFun has released Step 3.7 Flash, a 196B-parameter Mixture-of-Experts model that activates approximately 11B parameters per token. The multimodal model supports a 256K context window and introduces selectable reasoning levels (high/medium/low), priced at $0.20 per 1M input tokens and $1.15 per 1M output tokens.
Anthropic's Opus 4.8 matches Claude Mythos Preview in alignment, cuts thinking mode costs by 67%
Anthropic released Claude Opus 4.8 on May 28, 2026, replacing Opus 4.7 at unchanged pricing. The company claims the model's misalignment rates match those of Claude Mythos Preview, the experimental model deemed too dangerous for public release in April 2026. Opus 4.8 delivers faster thinking modes at one-third the cost of version 4.7.
Anthropic releases Claude Opus 4.8 with improved agentic coding and reasoning benchmarks
Anthropic released Claude Opus 4.8 on May 28, 2026, with improved performance in agentic coding, computer use, and reasoning benchmarks. Pricing remains at $5 per million input tokens and $25 per million output tokens, while the model's fast mode is now three times cheaper than previous versions.
Comments
Loading...