researchAnthropic
Researchers achieve 141% improvement in agent training with just 312 human demonstrations
Researchers at GAIR-NLP have published PC Agent-E, an agent training framework that achieves a 141% relative improvement in computer use tasks starting from only 312 human-annotated trajectories. The method uses Claude 3.7 Sonnet to synthesize alternative action decisions, and the resulting model outperforms Claude 3.7 Sonnet by 10% on WindowsAgentArena-V2.