H Company Ships Holo3.1 with Local Inference, Mobile Support, and 79.3% AndroidWorld Score
H Company released Holo3.1, a computer-use agent model family ranging from 0.8B to 35B parameters. The 35B-A3B variant scores 79.3% on AndroidWorld, up from 67% in Holo3. For the first time, H Company ships quantized checkpoints (FP8, Q4 GGUF, NVFP4) enabling local inference with 1.74× throughput gains and sub-4-second agent step times.
H Company Ships Holo3.1 with Local Inference, Mobile Support, and 79.3% AndroidWorld Score
H Company released Holo3.1, an updated family of computer-use agent models designed for cross-environment deployment and local inference. The release includes four model sizes (0.8B, 4B, 9B, and 35B-A3B parameters) and marks H Company's first shipment of quantized checkpoints for on-device execution.
Performance Gains Across Mobile and Desktop
Holo3.1-35B-A3B achieves 79.3% on AndroidWorld, a 12.3 percentage point improvement over Holo3's 67%. Smaller variants also show gains: the 4B and 9B models improve from 58% to 72% on the same benchmark.
Built on the Qwen model family, Holo3.1 adds native function-calling support alongside the structured JSON outputs available in Holo3. According to H Company, function-calling and native execution now achieve near-parity performance across OSWorld and internal benchmarks covering e-commerce and business software workflows.
The company reports a 25% improvement over Holo3 when evaluated inside its Holotab product harness, though absolute scores were not disclosed.
Quantized Checkpoints for Local Deployment
H Company ships FP8, Q4 GGUF, and NVFP4 quantized weights for the 35B-A3B model. The NVFP4 checkpoint uses NVIDIA's Model Optimizer in a W4A16 configuration.
On DGX Spark hardware, NVFP4 W4A16 delivers 1.41× the token throughput of FP8 and 1.74× that of BF16, according to H Company's benchmarks. FP8 and NVFP4 match OSWorld scores, trailing the BF16 checkpoint by approximately two points.
End-to-end agent step time drops from 6.8 seconds to 3.3 seconds on Spark with NVFP4 quantization and agent harness optimizations, representing a 2× compound speedup. H Company states these optimizations will ship in an upcoming desktop agent harness.
The Q4 GGUF checkpoints target consumer hardware including Apple Silicon, enabling fully local execution where both the agent and model run on the same machine or local network.
Model Specifications
The Holo3.1 family includes:
- Holo3.1-0.8B: Ultra-lightweight local agents
- Holo3.1-4B: Cost-efficient deployment
- Holo3.1-9B: Balanced performance and latency
- Holo3.1-35B-A3B: State-of-the-art performance
Pricing, context window size, and training data cutoff date were not disclosed. Models are available through H Company's API and Hugging Face.
What This Means
Holo3.1's quantized checkpoints address a practical deployment constraint: running computer-use agents on local hardware without cloud dependencies. The 2× speedup on DGX Spark and sub-4-second step times make interactive agent workflows more viable. The 12-point AndroidWorld improvement suggests H Company expanded its training or fine-tuning data to cover mobile interfaces more thoroughly. However, without disclosed benchmark scores on standard desktop tasks like OSWorld or pricing details, it's unclear whether the quantization tradeoffs favor local deployment over cloud inference for most use cases.
Related Articles
StepFun Releases Step-3.7-Flash: 198B-Parameter Sparse MoE Model With 256K Context in GGUF Format
StepFun has released Step-3.7-Flash, a 198B-parameter sparse Mixture-of-Experts vision-language model that activates approximately 11B parameters per token. The model supports a 256K context window, native image understanding via a 1.8B-parameter vision encoder, and offers three selectable reasoning levels.
Liquid AI Releases LFM2.5-8B: 8-Billion Parameter Hybrid Model Optimized for Edge Deployment
Liquid AI has released LFM2.5-8B-A1B, an 8-billion parameter hybrid model designed specifically for edge AI and on-device deployment. The model is available in multiple GGUF quantized formats ranging from 4-bit (4.84 GB) to 16-bit (16.9 GB), optimized for memory efficiency.
Anthropic invites 150 more organizations to Claude Mythos preview, citing cybersecurity risks
Anthropic has invited approximately 150 additional organizations to Project Glasswing, its restricted preview program for Claude Mythos. The company continues to withhold public release of the frontier model due to its advanced capability to find and exploit software vulnerabilities, which Anthropic claims can surpass all but the most skilled human security researchers.
Anthropic expands Claude Mythos cybersecurity program to 150 new partners, promises public release in weeks
Anthropic is expanding its Project Glasswing cybersecurity initiative to approximately 150 new organizations, including Samsung and NATO, bringing the total to around 200 partners across 15+ countries. The company says it will release Mythos-class models to all customers within weeks, following the April unveiling of Claude Mythos which was initially limited to select partners like Apple.
Comments
Loading...