model release

H Company Ships Holo3.1 with Local Inference, Mobile Support, and 79.3% AndroidWorld Score

TL;DR

H Company released Holo3.1, a computer-use agent model family ranging from 0.8B to 35B parameters. The 35B-A3B variant scores 79.3% on AndroidWorld, up from 67% in Holo3. For the first time, H Company ships quantized checkpoints (FP8, Q4 GGUF, NVFP4) enabling local inference with 1.74× throughput gains and sub-4-second agent step times.

June 2, 2026 · 2:21 PM2 min read

Holo3.1-35B-A3B — Quick Specs

Compare Holo3.1-35B-A3B with other models →

H Company Ships Holo3.1 with Local Inference, Mobile Support, and 79.3% AndroidWorld Score

H Company released Holo3.1, an updated family of computer-use agent models designed for cross-environment deployment and local inference. The release includes four model sizes (0.8B, 4B, 9B, and 35B-A3B parameters) and marks H Company's first shipment of quantized checkpoints for on-device execution.

Performance Gains Across Mobile and Desktop

Holo3.1-35B-A3B achieves 79.3% on AndroidWorld, a 12.3 percentage point improvement over Holo3's 67%. Smaller variants also show gains: the 4B and 9B models improve from 58% to 72% on the same benchmark.

Built on the Qwen model family, Holo3.1 adds native function-calling support alongside the structured JSON outputs available in Holo3. According to H Company, function-calling and native execution now achieve near-parity performance across OSWorld and internal benchmarks covering e-commerce and business software workflows.

The company reports a 25% improvement over Holo3 when evaluated inside its Holotab product harness, though absolute scores were not disclosed.

Quantized Checkpoints for Local Deployment

H Company ships FP8, Q4 GGUF, and NVFP4 quantized weights for the 35B-A3B model. The NVFP4 checkpoint uses NVIDIA's Model Optimizer in a W4A16 configuration.

On DGX Spark hardware, NVFP4 W4A16 delivers 1.41× the token throughput of FP8 and 1.74× that of BF16, according to H Company's benchmarks. FP8 and NVFP4 match OSWorld scores, trailing the BF16 checkpoint by approximately two points.

End-to-end agent step time drops from 6.8 seconds to 3.3 seconds on Spark with NVFP4 quantization and agent harness optimizations, representing a 2× compound speedup. H Company states these optimizations will ship in an upcoming desktop agent harness.

The Q4 GGUF checkpoints target consumer hardware including Apple Silicon, enabling fully local execution where both the agent and model run on the same machine or local network.

Model Specifications

The Holo3.1 family includes:

Holo3.1-0.8B: Ultra-lightweight local agents
Holo3.1-4B: Cost-efficient deployment
Holo3.1-9B: Balanced performance and latency
Holo3.1-35B-A3B: State-of-the-art performance

Pricing, context window size, and training data cutoff date were not disclosed. Models are available through H Company's API and Hugging Face.

What This Means

Holo3.1's quantized checkpoints address a practical deployment constraint: running computer-use agents on local hardware without cloud dependencies. The 2× speedup on DGX Spark and sub-4-second step times make interactive agent workflows more viable. The 12-point AndroidWorld improvement suggests H Company expanded its training or fine-tuning data to cover mobile interfaces more thoroughly. However, without disclosed benchmark scores on standard desktop tasks like OSWorld or pricing details, it's unclear whether the quantization tradeoffs favor local deployment over cloud inference for most use cases.

Source: huggingface.co ↗

holo3.1 h-company computer-use agent quantization local-inference mobile-automation androidworld

model releaseJuly 14, 2026

PrismML releases Bonsai 27B, claims first 27B-parameter model to run on-device on iPhone at 4GB memory footprint

PrismML has released Bonsai 27B, claiming it's the first 27-billion parameter model capable of running on-device on iPhone. The model achieves 58-87 tokens per second on Apple's M5 Max chip with a 4GB memory footprint, using 1-bit and ternary quantization to fit within iPhone's approximately 6GB available app memory.

model releaseJuly 17, 2026

Moonshot AI releases Kimi K3, China's largest model at 2.8 trillion parameters

Beijing-based Moonshot AI released Kimi K3, China's largest AI model at 2.8 trillion parameters. The company claims the model consistently outperforms OpenAI's GPT 5.5 and Anthropic's Claude Opus 4.8 on benchmarks including coding and general agents, though it still trails the leading-edge GPT 5.6 Sol and Claude Fable 5 in overall performance.

model releaseJuly 17, 2026

Moonshot AI releases Kimi K3 with 2.7 trillion parameters, claims performance on par with Anthropic Fable 5

Moonshot AI released Kimi K3 on July 16, 2026, featuring 2.7 trillion parameters—the largest open-weight model to date. The company claims K3 performs competitively with Anthropic's Fable 5 while costing $15 per million output tokens compared to Fable's $50.

model releaseJuly 17, 2026

Moonshot AI's Kimi k3 claims top performance among Chinese models with 1M token context

Moonshot AI has released Kimi k3, positioning it as China's leading AI model. The company claims the model features a 1 million token context window and improved reasoning capabilities, though independent benchmarks are not yet available.

H Company Ships Holo3.1 with Local Inference, Mobile Support, and 79.3% AndroidWorld Score

Holo3.1-35B-A3B — Quick Specs

H Company Ships Holo3.1 with Local Inference, Mobile Support, and 79.3% AndroidWorld Score

Performance Gains Across Mobile and Desktop

Quantized Checkpoints for Local Deployment

Model Specifications

What This Means

Related Articles

PrismML releases Bonsai 27B, claims first 27B-parameter model to run on-device on iPhone at 4GB memory footprint

Moonshot AI releases Kimi K3, China's largest model at 2.8 trillion parameters

Moonshot AI releases Kimi K3 with 2.7 trillion parameters, claims performance on par with Anthropic Fable 5

Moonshot AI's Kimi k3 claims top performance among Chinese models with 1M token context

Comments