model release

PrismML releases 1-bit Bonsai 8B model, claims 14x smaller and 5x more energy efficient than full-precision peers

TL;DR

PrismML, a Caltech-founded startup, has released Bonsai 8B, a 1-bit quantized large language model that the company claims is 14x smaller and 5x more energy efficient than full-precision counterparts while remaining competitive with standard 8B models. The model fits into 1.15GB of memory and uses a novel 1-bit weight representation (binary signs with shared scale factors per weight group) instead of traditional 16-bit or 32-bit precision.

April 4, 2026 · 8:20 AM2 min read

Bonsai 8B (1-bit) — Quick Specs

Compare Bonsai 8B (1-bit) with other models →

PrismML Releases 1-Bit Bonsai 8B Model

PrismML, an AI venture founded by Caltech electrical engineering professor Babak Hassibi, has released Bonsai 8B, a 1-bit quantized large language model designed to run on edge devices with minimal power requirements.

Model Specifications

Bonsai 8B achieves aggressive compression through a 1-bit weight representation where each neural network weight is encoded as only its sign ({−1, +1}) with a shared scale factor stored for each group of weights. According to PrismML's claims:

Memory footprint: 1.15GB
Size reduction: 14x smaller than full-precision equivalents
Inference speed: 8x faster on edge hardware
Energy efficiency: 5x more efficient than full-precision models
Performance: Competitive with other 8B parameter models on standard benchmarks
Intelligence density (PrismML's custom metric): 1.06/GB, compared to 0.10/GB for Qwen3 8B

PrismML also released smaller variants: Bonsai 4B and Bonsai 1.7B, all under the Apache 2.0 License.

Technical Approach

The 1-bit architecture builds on years of quantization research, including the 2017 paper "BitNet: Bit-Regularized Deep Neural Networks" and the 2024 work "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits." Hassibi and colleagues developed mathematical theory to compress models without degrading reasoning capabilities, according to the company.

PrismML claims its approach avoids historical tradeoffs of low-bit quantization—specifically poor instruction following, faulty multi-step reasoning, and unreliable tool use—though independent verification of these claims is not yet available.

Deployment and Availability

The company reports that Bonsai 8B runs natively on:

Apple devices (Mac, iPhone, iPad) via MLX framework
Nvidia GPUs via llama.cpp CUDA
Other edge hardware platforms

Model weights are available immediately under Apache 2.0 License for open-source use.

Market Context

While standard benchmark comparisons show Qwen3 8B slightly ahead on MMLU Redux, MuSR, and GSM8K, PrismML argues that traditional metrics miss the efficiency dimension critical for on-device deployment. The company proposes "intelligence density"—defined as negative log of average error rate divided by model size—as a superior metric for edge AI viability.

Hashibi positioned 1-bit quantization not as a final approach but as a foundational shift toward measuring AI in terms of "intelligence per unit of compute and energy," drawing parallels to how the industry adopted performance-per-watt as a standard metric.

Intended Use Cases

PrismML targets applications requiring on-device execution due to latency, privacy, or compliance constraints:

On-device AI agents
Real-time robotics systems
Enterprise systems with strict data residency requirements
Mobile and IoT devices with power limitations

What This Means

Bonsai 8B represents a practical milestone in 1-bit quantization, moving from academic research to deployable models. If the claimed efficiency gains hold under real-world conditions, this could significantly expand viable use cases for LLMs on edge devices—particularly mobile and embedded systems where bandwidth and power are bottlenecks. However, the company's custom "intelligence density" metric warrants scrutiny; it's designed to showcase 1-bit models favorably and shouldn't replace independent third-party benchmarking. Real-world inference quality on instruction-following and reasoning tasks remains to be independently validated.

Source: go.theregister.com ↗

model_release quantization 1-bit edge_ai on-device bonsai_8b prismml caltech

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

model releaseJuly 4, 2026

NVIDIA releases Nemotron-Labs-TwoTower-30B: block-wise diffusion model claims 2.42× faster generation at 98.7% baseline

NVIDIA released Nemotron-Labs-TwoTower-30B-A3B-Base-BF16, a block-wise diffusion language model that generates text by denoising blocks of tokens in parallel rather than sequentially. According to NVIDIA, the model achieves 2.42× the wall-clock generation throughput of its autoregressive baseline while retaining 98.7% of aggregate benchmark quality.

model releaseJuly 3, 2026

Mistral Releases Leanstral 1.5: 6B-Parameter Model Achieves 100% on miniF2F, Solves 587/672 PutnamBench Problems

Mistral AI released Leanstral 1.5, a free Apache-2.0 licensed model with 119B total parameters and 6B active parameters specialized for formal verification in Lean 4. The model achieves 100% on miniF2F benchmark, solves 587 of 672 PutnamBench problems at $4 per problem (versus $300+ for competitors), and reaches state-of-the-art 87% on FATE-H and 34% on FATE-X benchmarks.

model releaseJuly 1, 2026

Anthropic Restores Claude Fable 5 After Government Takedown, With Stricter Cybersecurity Blocks

Anthropic is redeploying Claude Fable 5 after a month-long government-mandated takedown triggered by Amazon researchers discovering a method to bypass the model's cybersecurity safeguards. The returning version includes enhanced safety classifiers that automatically block cybersecurity tasks and revert to Opus 4.8, with restricted availability through usage credits only.