model release

Holo3 achieves 78.85% on OSWorld benchmark with only 10B active parameters

TL;DR

H Company unveiled Holo3, a computer use model that scores 78.85% on the OSWorld-Verified benchmark—the highest on the leading desktop automation benchmark. The model achieves this with only 10B active parameters (122B total), positioning it as a lower-cost alternative to proprietary models like GPT 5.4 and Opus 4.6.

April 1, 2026 · 4:50 PM2 min read

Holo3-122B-A10B — Quick Specs

Compare Holo3-122B-A10B with other models →

Holo3 Achieves New State-of-the-Art on OSWorld Computer Use Benchmark

H Company announced Holo3, a specialized model for autonomous computer use that scores 78.85% on the OSWorld-Verified benchmark—the industry's leading desktop automation benchmark. The company claims this represents a new state-of-the-art performance on the measure.

Model Specifications and Availability

The flagship variant, Holo3-122B-A10B, uses a mixture-of-experts architecture with 122B total parameters but only 10B active parameters per inference step. This design aims to reduce computational cost compared to dense proprietary models.

H Company offers two tiers of access:

Holo3-35B-A3B: Weights openly available on Hugging Face under Apache 2.0 license, with free tier access through H Company's inference API
Holo3-122B-A10B: Available exclusively through H Company's Inference API

Pricing has not yet been disclosed. The company positions the models as lower-cost alternatives to GPT 5.4 and Opus 4.6, though specific per-token pricing was not provided.

Training Approach: The Agentic Learning Flywheel

Holo3 uses a three-stage training pipeline:

Synthetic Navigation Data: Generated scenario-specific navigation examples from human and automated instructions
Out-of-Domain Augmentation: Programmatic extension of scenarios to handle unexpected variations
Curated Reinforcement Learning: Advanced data filtering and RL optimization applied to all training samples

The company emphasizes that its training methodology—called the "agentic learning flywheel"—focuses on two core capabilities: perception (visual grounding on UI elements) and decision-making (action sequencing).

Internal Benchmarking: H Corporate Benchmarks

Beyond OSWorld validation, H Company developed proprietary H Corporate Benchmarks containing 486 multi-step tasks across four categories:

E-commerce workflows
Business software operations
Collaboration tools
Multi-application workflows requiring cross-system coordination

Tasks range from single-application focus to complex multi-app scenarios—such as retrieving equipment prices from PDFs, cross-referencing employee budgets, and sending personalized approval emails. According to the company, Holo3 outperforms larger base models (including Qwen 3.5 variants) on these single-application benchmarks despite having significantly fewer parameters.

H Company built these benchmarks using a "Synthetic Environment Factory" that automatically generates websites and enterprise applications via coding agents, then validates task completion with verification scripts.

What This Means

Holo3 demonstrates that specialized training for computer use tasks can rival or exceed dense proprietary models at lower parameter counts. The 78.85% OSWorld score is competitive with publicly disclosed results from other vendors, though direct comparison requires reviewing their methodologies and benchmark versions.

The mixture-of-experts architecture with 10B active parameters is operationally significant—it suggests meaningful efficiency gains in production deployment compared to dense 122B models, which could translate to lower latency and infrastructure costs.

The open-sourcing of Holo3-35B-A3B under Apache 2.0 gives developers access to a computer use model without licensing restrictions, though performance on OSWorld for this smaller variant was not disclosed. H Company's investment in synthetic environments and internal benchmarking suggests confidence that the model generalizes beyond these controlled settings, but real-world enterprise performance data remains absent.

The stated next frontier—"Adaptive Agency" enabling models to autonomously learn new enterprise software in real-time—remains a claim rather than a demonstrated capability.

Source: huggingface.co ↗

holo3 computer-use osworld-benchmark mixture-of-experts h-company autonomous-agents enterprise-ai ui-automation

model releaseMay 13, 2026

DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter

DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per forward pass. The model supports a 1M-token context window and is available free through OpenRouter, targeting high-throughput coding and chat applications.

model releaseMay 8, 2026

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

model releaseMay 8, 2026

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Allen Institute for AI released EMO, a 1B-active, 14B-total-parameter mixture-of-experts model trained on 1 trillion tokens. The model uses 8 active experts per token from a pool of 128 total experts, and can maintain near full-model performance while using just 12.5% of its experts for specific tasks.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.