OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions
OpenAI's GPT-5.5 scored 93 out of 100 points in a 10-round benchmark test covering summarization, reasoning, coding, and creative tasks. The model lost points primarily for ignoring specific instructions, such as using unauthorized sources when asked to summarize from a single news outlet.
OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions
OpenAI's GPT-5.5 scored 93 out of 100 points in a 10-round benchmark test covering summarization, reasoning, coding, and creative tasks. The model lost points primarily for ignoring specific instructions, such as using unauthorized sources when asked to summarize from a single news outlet.
According to testing by ZDNET, GPT-5.5 shows improvements in agentic coding, conceptual clarity, scientific research ability, and accuracy during knowledge work compared to GPT-5.4. The model is currently available only to ChatGPT Plus subscribers and above, accessible through the "Thinking" effort level in both Standard and Extended modes.
Test performance breakdown
The model achieved perfect 10/10 scores on seven of ten tests:
- Academic concept explanation (explaining educational constructivism to a five-year-old)
- Math and pattern recognition (correctly identifying and extending the Fibonacci sequence)
- Cultural discussion (analyzing social media's impact on communication)
- Literary analysis (identifying themes in Game of Thrones)
- Travel itinerary planning (creating a week-long Boston vacation focused on technology and history)
- Coding tasks
- Creative writing
The model scored 9/10 on travel itinerary planning and 5/10 on news summarization. According to the tester, GPT-5.5 "did correctly summarize the meat of the story, but it didn't follow my instructions to use Yahoo News as the source." Instead of using the specified single source, the model pulled information from AP, The Sun, Wall Street Journal, The Guardian, and Wikipedia.
Development velocity increase
OpenAI's release cadence has accelerated significantly. GPT-5.5 follows closely after GPT-5.4 and the launch of ChatGPT Images 2.0 earlier in the same week. According to the report, this increased pace is "most likely because AI coding has significantly reduced OpenAI's development time."
The tester used ChatGPT 5.5 Thinking with Images 2.0 to generate a release cadence visualization chart in under 10 minutes—a task that would have previously required at least two hours of manual work.
Instruction-following concerns
The testing revealed a pattern of "overeagerness" where GPT-5.5 performs additional work beyond what was requested. The tester noted: "If I had wanted a comprehensive news answer, that would have been fine. But the prompt specifically said to look at Yahoo News, and GPT-5.5 pretty much ignored that instruction."
This behavior raises concerns about autonomous agent capabilities. The tester stated: "If even a simple summary prompt can't be followed correctly, it does not give me confidence that it's safe to let agents run wild on long-horizon projects."
What this means
GPT-5.5 represents incremental improvements in reasoning and output quality, but OpenAI has not solved the fundamental instruction-following problem that has plagued large language models. The tension between capability and controllability becomes more critical as the industry pushes toward autonomous agents. For practical applications requiring strict adherence to guidelines—legal work, medical documentation, financial analysis—this "overeagerness" represents a reliability gap that limits production deployment. The rapid release cycle suggests OpenAI is iterating quickly, but the persistence of instruction-following issues indicates these may be architectural limitations rather than easily patchable bugs.
Related Articles
OpenAI releases GPT-5.5 with faster token efficiency and improved context understanding
OpenAI released GPT-5.5, which completes tasks at the same difficulty level faster than GPT-5.4 while using significantly fewer tokens. The model is available to ChatGPT Plus, Pro, Business, and Enterprise users, with API access delayed due to required safeguards.
OpenAI releases GPT-5.5 with 400K context window, higher pricing than GPT-5.4
OpenAI released GPT-5.5 on April 23, 2026, seven weeks after GPT-5.4. The model features a 400K context window in Codex and claims improvements in multi-step tasks, agentic coding, and computer use, though at higher pricing than its predecessor.
OpenAI releases GPT-5.5 with improved coding and computer control capabilities
OpenAI released GPT-5.5, its latest AI model with enhanced coding, computer operation, and research capabilities. The model is rolling out to paid subscribers in ChatGPT and Codex, with API access coming soon.
OpenAI teases GPT-5.5 in cryptic Base64-encoded message 'NS41'
OpenAI posted the cryptic message 'NS41' on X, which decodes to '5.5' through Base64 encoding or mathematical interpretation. The teaser suggests an imminent announcement for GPT-5.5, though no release date, pricing, or technical specifications have been disclosed.
Comments
Loading...