product updateAnthropic

Anthropic tests AI agent marketplace with 186 deals totaling $4,000 among employees

TL;DR

Anthropic conducted an internal experiment called Project Deal where AI agents represented 69 employees as buyers and sellers in a classified marketplace. The agents completed 186 real transactions totaling over $4,000, revealing that more advanced models achieved better outcomes but users couldn't detect the performance disparity.

2 min read
0

Anthropic tests AI agent marketplace with 186 deals totaling $4,000 among employees

Anthropic conducted an internal experiment where AI agents autonomously negotiated purchases and sales on behalf of human users, completing 186 transactions worth more than $4,000.

The experiment, called Project Deal, involved 69 Anthropic employees who each received a $100 budget (distributed via gift cards) to buy items from coworkers. AI agents represented both buyers and sellers, handling all negotiation and deal-making.

Experiment structure

Anthropic ran four separate marketplaces with different AI models. One marketplace was "real" — using the company's most advanced model with deals actually honored after completion. Three additional marketplaces were created for comparative study.

Each employee participated with real money and real goods, though Anthropic describes this as "only a pilot experiment with a self-selected participant pool."

Key findings

The company identified several concerning patterns:

Model quality disparity: Users represented by more advanced models achieved "objectively better outcomes" according to Anthropic. However, users couldn't detect this performance gap, raising what the company calls "agent quality gaps" where people on the losing end don't realize they're worse off.

Instruction irrelevance: Initial instructions given to the agents had no measurable effect on sale likelihood or negotiated prices, suggesting the models may override or ignore certain user preferences.

Transaction volume: Despite the limited participant pool, agents completed 186 deals, indicating high activity levels when AI handles negotiation overhead.

What this means

Project Deal demonstrates both the potential and risks of AI-to-AI commerce. The ability to complete 186 transactions among 69 people shows agents can reduce friction in peer-to-peer marketplaces. But the undetected performance gaps present a fairness problem: if users with access to better models systematically extract more value without others noticing, it creates invisible economic stratification.

The finding that initial instructions don't affect outcomes is particularly notable. It suggests current AI agents may be difficult to control through natural language directives alone, with implications for how users can meaningfully direct agent behavior in commercial settings.

Anthropic hasn't announced plans to commercialize Project Deal, but the experiment provides early data on how AI agents might reshape commerce when negotiating with each other rather than humans.

Related Articles

product update

OpenAI launches workspace agents for business teams, may phase out GPTs

OpenAI released workspace agents for its Business, Enterprise, Edu, and Teachers plan users—cloud-based bots that can automate business tasks like gathering product feedback and drafting emails. The company indicates these agents are an 'evolution' of GPTs, which may soon be deprecated.

product update

Anthropic adds 16 third-party connectors to Claude, including Spotify, Uber Eats, and TurboTax

Anthropic launched 16 new connectors for Claude that integrate with third-party services including Spotify, Uber Eats, Resy, TurboTax, and Instacart. The connectors allow Claude to perform actions like ordering food, making restaurant reservations, controlling music playback, and assisting with tax preparation.

product update

Anthropic adds 16 service integrations to Claude, including Uber, Instacart, and Audible

Anthropic has expanded Claude's connector system with 16 new service integrations, bringing the total to over 200 partners. The additions include Uber, Instacart, Audible, Booking.com, and Spotify, with the company emphasizing no paid placements or sponsored answers.

product update

Anthropic identifies three bugs causing Claude Code quality degradation over two months

Anthropic confirmed that widespread complaints about Claude Code quality degradation were caused by three separate bugs in the coding assistant's harness, not the underlying models. One critical bug caused Claude to clear its thinking context every turn in sessions that had been idle for over an hour, making it appear forgetful and repetitive.

Comments

Loading...