model releaseOpenAI

OpenAI releases GPT-5.5 with 82.7% Terminal-Bench score, API priced at $5/$30 per million tokens

TL;DR

OpenAI released GPT-5.5 on April 23, its first retrained base model since GPT-4.5, scoring 82.7% on Terminal-Bench 2.0 versus GPT-5.4's 75.1% and Claude Opus 4.7's 69.4%. API pricing is set at $5 per million input tokens and $30 per million output tokens, exactly double GPT-5.4 rates.

2 min read
0

GPT-5.5 — Quick Specs

Context window1000K tokens
Input$5/1M tokens
Output$30/1M tokens

OpenAI released GPT-5.5 on April 23, positioning it as "a new class of intelligence for real work and powering agents." The model scores 82.7% on Terminal-Bench 2.0, a benchmark testing command-line workflows requiring planning and tool coordination, compared to GPT-5.4's 75.1% and Claude Opus 4.7's 69.4%.

Performance and benchmarks

GPT-5.5 is OpenAI's first retrained base model since GPT-4.5, co-designed with NVIDIA's GB200 and GB300 NVL72 rack-scale systems. On SWE-Bench Pro, which evaluates GitHub issue resolution, GPT-5.5 reaches 58.6%. On Expert-SWE, an internal benchmark where tasks carry a median estimated human completion time of 20 hours, it scores 73.1%, up from GPT-5.4's 68.5%.

In long-context reasoning at one million tokens on MRCR v2, GPT-5.5 scores 74.0% versus GPT-5.4's 36.6%. However, on MCP Atlas—Scale AI's Model Context Protocol tool-use benchmark—Claude Opus 4.7 leads at 79.1% with no score recorded for GPT-5.5. OpenAI included this absence in its own benchmark table.

API pricing and token efficiency

API access is priced at $5 per million input tokens and $30 per million output tokens—exactly double GPT-5.4's rates of $2.50 and $15. According to OpenAI, GPT-5.5 completes the same tasks with fewer tokens than GPT-5.4, making effective costs roughly 20% higher once efficiency is factored in. Independent testing lab Artificial Analysis validated this claim.

GPT-5.5 Pro, available to Pro, Business, and Enterprise users, costs $30 per million input tokens and $180 per million output tokens. It applies additional parallel test-time compute and scores 90.1% on BrowseComp, OpenAI's agentic web-browsing benchmark.

At 10 million output tokens per month, GPT-5.5 standard costs $300 compared to Claude Opus 4.7's $250—a 20% premium that only pays off if superior agentic performance reduces task iterations and retries.

Availability and deployment

The model rolled out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex on April 23, with API access following on April 24. According to OpenAI, more than 85% of employees now use Codex weekly across departments. In one example, the communications team used GPT-5.5 to process six months of speaking request data, building a scoring and risk framework to automate low-risk approvals.

Greg Brockman called the release "a real step forward towards the kind of computing that we expect in the future," while chief scientist Jakub Pachocki noted the last two years of model progress had felt "surprisingly slow."

What this means

The Terminal-Bench lead positions GPT-5.5 strongly for unattended terminal agents and DevOps automation, while the 58.6% SWE-Bench Pro score suggests improved GitHub issue resolution. The MCP Atlas gap raises questions for teams building heavily on tool-use orchestration. The 2x API price increase makes GPT-5.5 a premium option where task completion rates and reduced iteration matter more than raw per-token costs. OpenAI claims GPT-5.5 matches GPT-5.4's per-token latency despite higher capability.

Related Articles

model release

Cohere Releases North Mini Code 1.0: 30B-Parameter MoE Model With 256K Context for Agentic Coding

Cohere Labs has released North Mini Code 1.0, a 30B-parameter sparse Mixture-of-Experts model with 3B active parameters and a 256K context window. The Apache 2.0-licensed model is optimized for agentic software engineering, featuring 128 experts with 8 activated per token, and trained specifically for tool use in coding tasks.

model release

Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.

model release

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

Apple announced five third-generation foundation models at WWDC26, headlined by AFM 3 Core Advanced—a 20-billion-parameter sparse model that runs on-device by activating only 1-4 billion parameters at a time. For the first time, Apple extended Private Cloud Compute to third-party infrastructure, with AFM 3 Cloud Pro running on Nvidia GPUs in Google Cloud.

model release

Google releases DiffusionGemma 26B, open-weight model generates 500+ tokens/second

Google has released DiffusionGemma 26B, an open-weight text generation model under Apache 2 license. The model generates over 500 tokens/second according to testing on NVIDIA's free NIM API, where it produced 2,409 tokens in 4.4 seconds.

Comments

Loading...