Liquid AI releases LFM2.5-230M, a 230M parameter edge model running at 213 tok/s on Galaxy S25 Ultra
Liquid AI has released LFM2.5-230M, a 230M parameter hybrid model trained on 19 trillion tokens with a 32,768 token context window. The model achieves 213 tok/s decode speed on Galaxy S25 Ultra and 42 tok/s on Raspberry Pi 5, with support for function calling and data extraction tasks.
Liquid AI Releases LFM2.5-230M Edge Model
Liquid AI has released LFM2.5-230M, a 230 million parameter hybrid model designed for on-device deployment. The model was trained on 19 trillion tokens with a 32,768 token context window and knowledge cutoff of mid-2024.
Architecture and Performance
LFM2.5-230M uses a hybrid architecture with 14 layers: 8 double-gated LIV convolution blocks and 6 GQA blocks. The model supports a vocabulary size of 65,536 tokens and is available in multiple formats including native, GGUF, ONNX, and MLX for Apple Silicon.
According to Liquid AI, the model achieves 213 tokens per second decode speed on Samsung Galaxy S25 Ultra and 42 tok/s on Raspberry Pi 5. Benchmark scores include 25.41 on GPQA Diamond, 20.25 on MMLU-Pro, and 71.71 on IFEval.
Tool Use and Function Calling
The model supports function calling through a four-step process using special tokens (<|tool_call_start|> and <|tool_call_end|>). By default, it outputs Pythonic function calls, with optional JSON format support. Liquid AI claims the model was distilled from LFM2.5-350M and refined with multi-stage reinforcement learning for tool use and data extraction tasks.
On specialized benchmarks, the model scores 43.26 on BFCLv3 (function calling), 21.03 on BFCLv4, and 22.51 on CaseReportBench (medical data extraction).
Comparison with Competing Models
LFM2.5-230M outperforms IBM's Granite 4.0-350M (25.91 GPQA Diamond vs 25.41) and Google's Gemma 3 1B IT (23.89) on certain benchmarks, while Qwen3.5-0.8B (Instruct) leads on MMLU-Pro with 37.42 compared to LFM2.5-230M's 20.25.
The model is available for commercial use with pricing not yet disclosed. It supports 10 languages including English, Arabic, Chinese, French, German, Italian, Japanese, Korean, Portuguese, and Spanish.
Deployment and Integration
LFM2.5-230M is compatible with Transformers (≥5.0.0), vLLM, llama.cpp, SGLang, and LM Studio. The model uses a ChatML-like format and can be deployed using standard inference frameworks.
Liquid AI recommends the model for data extraction and lightweight agentic pipelines, but notes it is not suitable for reasoning-heavy workloads such as advanced math, code generation, or creative writing.
What This Means
LFM2.5-230M represents a focused effort to create viable edge AI models that can run on consumer devices without cloud connectivity. The 230M parameter count positions it below the typical small language model threshold while claiming competitive performance through architectural innovations. The real test will be whether its hybrid convolution-attention architecture delivers sustained advantages in production deployments versus pure transformer models like Qwen3.5-0.8B, which shows stronger general knowledge scores but may have different computational profiles. Function calling support in a model this size could enable new on-device agentic applications if the accuracy claims hold up in practice.
Related Articles
White House Orders OpenAI to Limit GPT-5.6 Release to Approved Partners Only
The Trump administration has instructed OpenAI to release its newest model, GPT-5.6, only to a select group of government-approved partners rather than the general public. The Office of the National Cyber Director and Office of Science and Technology Policy will approve access customer by customer during a preview period.
OpenAI's ChatGPT 5.6 release restricted to government-approved customers initially
OpenAI will release ChatGPT 5.6 first to customers approved by the federal government, according to a staff memo from CEO Sam Altman. The company plans a broader release "a couple of weeks later," marking a significant departure from typical model rollouts.
OpenAI delays GPT-5.6 release after Trump administration mandates case-by-case customer approval
OpenAI CEO Sam Altman told employees the company will release GPT-5.6 in limited preview form only, with the Trump administration approving customer access on a case-by-case basis. The move follows stricter export controls imposed on Anthropic earlier this month.
China's Z.ai releases GLM-5.2, open-source model matching Claude and GPT-5.5 in cybersecurity tasks
Z.ai's GLM-5.2 performs on par with Claude Opus 4.8 and OpenAI's GPT-5.5 in cybersecurity benchmarks while costing roughly half as much to run. Security evaluations from Graphistry and Semgrep confirm the open-weight model's capabilities in vulnerability discovery and cyber investigation, raising concerns about accessibility of advanced hacking tools.
Comments
Loading...