LLM News | TPS

Researchers achieve 141% improvement in agent training with just 312 human demonstrations

Researchers at GAIR-NLP have published PC Agent-E, an agent training framework that achieves a 141% relative improvement in computer use tasks starting from only 312 human-annotated trajectories. The method uses Claude 3.7 Sonnet to synthesize alternative action decisions, and the resulting model outperforms Claude 3.7 Sonnet by 10% on WindowsAgentArena-V2.

March 5, 2026 · 1:07 AM2 min read

agent-training computer-use data-synthesis

via arxiv.org ↗