OpenAI releases GPT-5.5 with 82.7% Terminal-Bench score, API priced at $5/$30 per million tokens
OpenAI released GPT-5.5 on April 23, its first retrained base model since GPT-4.5, scoring 82.7% on Terminal-Bench 2.0 versus GPT-5.4's 75.1% and Claude Opus 4.7's 69.4%. API pricing is set at $5 per million input tokens and $30 per million output tokens, exactly double GPT-5.4 rates.
GPT-5.5 — Quick Specs
OpenAI released GPT-5.5 on April 23, positioning it as "a new class of intelligence for real work and powering agents." The model scores 82.7% on Terminal-Bench 2.0, a benchmark testing command-line workflows requiring planning and tool coordination, compared to GPT-5.4's 75.1% and Claude Opus 4.7's 69.4%.
Performance and benchmarks
GPT-5.5 is OpenAI's first retrained base model since GPT-4.5, co-designed with NVIDIA's GB200 and GB300 NVL72 rack-scale systems. On SWE-Bench Pro, which evaluates GitHub issue resolution, GPT-5.5 reaches 58.6%. On Expert-SWE, an internal benchmark where tasks carry a median estimated human completion time of 20 hours, it scores 73.1%, up from GPT-5.4's 68.5%.
In long-context reasoning at one million tokens on MRCR v2, GPT-5.5 scores 74.0% versus GPT-5.4's 36.6%. However, on MCP Atlas—Scale AI's Model Context Protocol tool-use benchmark—Claude Opus 4.7 leads at 79.1% with no score recorded for GPT-5.5. OpenAI included this absence in its own benchmark table.
API pricing and token efficiency
API access is priced at $5 per million input tokens and $30 per million output tokens—exactly double GPT-5.4's rates of $2.50 and $15. According to OpenAI, GPT-5.5 completes the same tasks with fewer tokens than GPT-5.4, making effective costs roughly 20% higher once efficiency is factored in. Independent testing lab Artificial Analysis validated this claim.
GPT-5.5 Pro, available to Pro, Business, and Enterprise users, costs $30 per million input tokens and $180 per million output tokens. It applies additional parallel test-time compute and scores 90.1% on BrowseComp, OpenAI's agentic web-browsing benchmark.
At 10 million output tokens per month, GPT-5.5 standard costs $300 compared to Claude Opus 4.7's $250—a 20% premium that only pays off if superior agentic performance reduces task iterations and retries.
Availability and deployment
The model rolled out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex on April 23, with API access following on April 24. According to OpenAI, more than 85% of employees now use Codex weekly across departments. In one example, the communications team used GPT-5.5 to process six months of speaking request data, building a scoring and risk framework to automate low-risk approvals.
Greg Brockman called the release "a real step forward towards the kind of computing that we expect in the future," while chief scientist Jakub Pachocki noted the last two years of model progress had felt "surprisingly slow."
What this means
The Terminal-Bench lead positions GPT-5.5 strongly for unattended terminal agents and DevOps automation, while the 58.6% SWE-Bench Pro score suggests improved GitHub issue resolution. The MCP Atlas gap raises questions for teams building heavily on tool-use orchestration. The 2x API price increase makes GPT-5.5 a premium option where task completion rates and reduced iteration matter more than raw per-token costs. OpenAI claims GPT-5.5 matches GPT-5.4's per-token latency despite higher capability.
Related Articles
NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode
NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.
OpenAI discontinues separate Codex line, merges coding capabilities into GPT-5.5
OpenAI will not release a separate GPT-5.5-Codex model, according to Romain Huet. The company unified its Codex coding model with the main GPT line starting with GPT-5.4, with GPT-5.5 featuring enhanced agentic coding and computer use capabilities.
Poolside releases Laguna XS.2: 33B parameter MoE coding model with 131K context window
Poolside has released Laguna XS.2, a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token, designed for agentic coding. The model features a 131,072-token context window, scores 68.2% on SWE-bench Verified, and is available under Apache 2.0 license with free API access.
NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context
NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.
Comments
Loading...