model release

PrismML releases 1-bit Bonsai 8B model, claims 14x smaller and 5x more energy efficient than full-precision peers

TL;DR

PrismML, a Caltech-founded startup, has released Bonsai 8B, a 1-bit quantized large language model that the company claims is 14x smaller and 5x more energy efficient than full-precision counterparts while remaining competitive with standard 8B models. The model fits into 1.15GB of memory and uses a novel 1-bit weight representation (binary signs with shared scale factors per weight group) instead of traditional 16-bit or 32-bit precision.

2 min read
0

PrismML Releases 1-Bit Bonsai 8B Model

PrismML, an AI venture founded by Caltech electrical engineering professor Babak Hassibi, has released Bonsai 8B, a 1-bit quantized large language model designed to run on edge devices with minimal power requirements.

Model Specifications

Bonsai 8B achieves aggressive compression through a 1-bit weight representation where each neural network weight is encoded as only its sign ({−1, +1}) with a shared scale factor stored for each group of weights. According to PrismML's claims:

  • Memory footprint: 1.15GB
  • Size reduction: 14x smaller than full-precision equivalents
  • Inference speed: 8x faster on edge hardware
  • Energy efficiency: 5x more efficient than full-precision models
  • Performance: Competitive with other 8B parameter models on standard benchmarks
  • Intelligence density (PrismML's custom metric): 1.06/GB, compared to 0.10/GB for Qwen3 8B

PrismML also released smaller variants: Bonsai 4B and Bonsai 1.7B, all under the Apache 2.0 License.

Technical Approach

The 1-bit architecture builds on years of quantization research, including the 2017 paper "BitNet: Bit-Regularized Deep Neural Networks" and the 2024 work "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits." Hassibi and colleagues developed mathematical theory to compress models without degrading reasoning capabilities, according to the company.

PrismML claims its approach avoids historical tradeoffs of low-bit quantization—specifically poor instruction following, faulty multi-step reasoning, and unreliable tool use—though independent verification of these claims is not yet available.

Deployment and Availability

The company reports that Bonsai 8B runs natively on:

  • Apple devices (Mac, iPhone, iPad) via MLX framework
  • Nvidia GPUs via llama.cpp CUDA
  • Other edge hardware platforms

Model weights are available immediately under Apache 2.0 License for open-source use.

Market Context

While standard benchmark comparisons show Qwen3 8B slightly ahead on MMLU Redux, MuSR, and GSM8K, PrismML argues that traditional metrics miss the efficiency dimension critical for on-device deployment. The company proposes "intelligence density"—defined as negative log of average error rate divided by model size—as a superior metric for edge AI viability.

Hashibi positioned 1-bit quantization not as a final approach but as a foundational shift toward measuring AI in terms of "intelligence per unit of compute and energy," drawing parallels to how the industry adopted performance-per-watt as a standard metric.

Intended Use Cases

PrismML targets applications requiring on-device execution due to latency, privacy, or compliance constraints:

  • On-device AI agents
  • Real-time robotics systems
  • Enterprise systems with strict data residency requirements
  • Mobile and IoT devices with power limitations

What This Means

Bonsai 8B represents a practical milestone in 1-bit quantization, moving from academic research to deployable models. If the claimed efficiency gains hold under real-world conditions, this could significantly expand viable use cases for LLMs on edge devices—particularly mobile and embedded systems where bandwidth and power are bottlenecks. However, the company's custom "intelligence density" metric warrants scrutiny; it's designed to showcase 1-bit models favorably and shouldn't replace independent third-party benchmarking. Real-world inference quality on instruction-following and reasoning tasks remains to be independently validated.

Related Articles

model release

xAI Launches Grok Build 0.1: Coding Model with 256K Context for Agentic Workflows

xAI has released Grok Build 0.1, a coding-specialized model with a 256K context window and unlimited text output. The model is designed for agentic software engineering workflows and powers xAI's Grok Build CLI tool.

model release

Stability AI Releases Stable Audio 3.0 Model Family Trained on Licensed Data

Stability AI has released Stable Audio 3.0, a model family for audio generation trained on fully licensed data. The company positions the release as a foundation for commercial audio applications, though specific technical specifications have not yet been disclosed.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

model release

NemoStation releases Marlin-2B: 2-billion parameter video VLM achieves dense captioning performance between Tarsier-34B

NemoStation has released Marlin-2B, a 2-billion parameter video vision-language model that produces structured scene and event captions with second-precise timestamps. The model tops the CaReBench dense captioning leaderboard and sits between Tarsier-34B and Gemini-1.5-Pro on DREAM-1K, while matching Gemini-2.0-Flash on temporal grounding benchmarks.

Comments

Loading...