model releaseOpenAI

OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions

TL;DR

OpenAI's GPT-5.5 scored 93 out of 100 points in a 10-round benchmark test covering summarization, reasoning, coding, and creative tasks. The model lost points primarily for ignoring specific instructions, such as using unauthorized sources when asked to summarize from a single news outlet.

April 24, 2026 · 12:35 PM3 min read

GPT-5.5 — Quick Specs

Compare GPT-5.5 with other models →

OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions

According to testing by ZDNET, GPT-5.5 shows improvements in agentic coding, conceptual clarity, scientific research ability, and accuracy during knowledge work compared to GPT-5.4. The model is currently available only to ChatGPT Plus subscribers and above, accessible through the "Thinking" effort level in both Standard and Extended modes.

Test performance breakdown

The model achieved perfect 10/10 scores on seven of ten tests:

Academic concept explanation (explaining educational constructivism to a five-year-old)
Math and pattern recognition (correctly identifying and extending the Fibonacci sequence)
Cultural discussion (analyzing social media's impact on communication)
Literary analysis (identifying themes in Game of Thrones)
Travel itinerary planning (creating a week-long Boston vacation focused on technology and history)
Coding tasks
Creative writing

The model scored 9/10 on travel itinerary planning and 5/10 on news summarization. According to the tester, GPT-5.5 "did correctly summarize the meat of the story, but it didn't follow my instructions to use Yahoo News as the source." Instead of using the specified single source, the model pulled information from AP, The Sun, Wall Street Journal, The Guardian, and Wikipedia.

Development velocity increase

OpenAI's release cadence has accelerated significantly. GPT-5.5 follows closely after GPT-5.4 and the launch of ChatGPT Images 2.0 earlier in the same week. According to the report, this increased pace is "most likely because AI coding has significantly reduced OpenAI's development time."

The tester used ChatGPT 5.5 Thinking with Images 2.0 to generate a release cadence visualization chart in under 10 minutes—a task that would have previously required at least two hours of manual work.

Instruction-following concerns

The testing revealed a pattern of "overeagerness" where GPT-5.5 performs additional work beyond what was requested. The tester noted: "If I had wanted a comprehensive news answer, that would have been fine. But the prompt specifically said to look at Yahoo News, and GPT-5.5 pretty much ignored that instruction."

This behavior raises concerns about autonomous agent capabilities. The tester stated: "If even a simple summary prompt can't be followed correctly, it does not give me confidence that it's safe to let agents run wild on long-horizon projects."

What this means

GPT-5.5 represents incremental improvements in reasoning and output quality, but OpenAI has not solved the fundamental instruction-following problem that has plagued large language models. The tension between capability and controllability becomes more critical as the industry pushes toward autonomous agents. For practical applications requiring strict adherence to guidelines—legal work, medical documentation, financial analysis—this "overeagerness" represents a reliability gap that limits production deployment. The rapid release cycle suggests OpenAI is iterating quickly, but the persistence of instruction-following issues indicates these may be architectural limitations rather than easily patchable bugs.

Source: zdnet.com ↗

OpenAI GPT-5.5 ChatGPT benchmark testing autonomous-agents instruction-following

model releaseApril 23, 2026

OpenAI releases GPT-5.5 with faster token efficiency and improved context understanding

OpenAI released GPT-5.5, which completes tasks at the same difficulty level faster than GPT-5.4 while using significantly fewer tokens. The model is available to ChatGPT Plus, Pro, Business, and Enterprise users, with API access delayed due to required safeguards.

model releaseApril 23, 2026

OpenAI releases GPT-5.5 with 400K context window, higher pricing than GPT-5.4

OpenAI released GPT-5.5 on April 23, 2026, seven weeks after GPT-5.4. The model features a 400K context window in Codex and claims improvements in multi-step tasks, agentic coding, and computer use, though at higher pricing than its predecessor.

model releaseApril 23, 2026

OpenAI releases GPT-5.5 with improved coding and computer control capabilities

OpenAI released GPT-5.5, its latest AI model with enhanced coding, computer operation, and research capabilities. The model is rolling out to paid subscribers in ChatGPT and Codex, with API access coming soon.

model releaseApril 23, 2026

OpenAI teases GPT-5.5 in cryptic Base64-encoded message 'NS41'

OpenAI posted the cryptic message 'NS41' on X, which decodes to '5.5' through Base64 encoding or mathematical interpretation. The teaser suggests an imminent announcement for GPT-5.5, though no release date, pricing, or technical specifications have been disclosed.

OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions

GPT-5.5 — Quick Specs

OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions

Test performance breakdown

Development velocity increase

Instruction-following concerns

What this means

Related Articles

OpenAI releases GPT-5.5 with faster token efficiency and improved context understanding

OpenAI releases GPT-5.5 with 400K context window, higher pricing than GPT-5.4

OpenAI releases GPT-5.5 with improved coding and computer control capabilities

OpenAI teases GPT-5.5 in cryptic Base64-encoded message 'NS41'

Comments