ChatGPT Images 2.0 scores 97% in head-to-head image generation benchmark against Google's Gemini Nano Banana at 85%
OpenAI's ChatGPT Images 2.0 scored 97% versus Google's Gemini Nano Banana at 85% in a nine-test image generation benchmark conducted by ZDNET. The tests measured capabilities including image restoration, text rendering, and prompt adherence, with Nano Banana losing points primarily for fabricating details and text errors.
ChatGPT Images 2.0 Scores 97% Against Gemini Nano Banana's 85% in Image Generation Tests
OpenAI's ChatGPT Images 2.0 scored 97% in a head-to-head image generation benchmark against Google's Gemini Nano Banana, which scored 85%, according to testing conducted by ZDNET's David Gewirtz.
The nine-test benchmark evaluated both models on image restoration, text rendering, prompt adherence, and creative generation. ChatGPT Images 2.0, released last week alongside GPT-5.5, demonstrated significant improvements over its December 2025 performance of 74%.
Test Results Breakdown
In the admiral photo recontextualization test (15 points possible), ChatGPT Images 2.0 scored 14 while Nano Banana scored 12. Both models generated accurate backgrounds and naval uniforms but made errors in uniform details. Nano Banana lost additional points for altering facial features, including adding a modified beard and "wacky grin" to the test subject.
The restoration tests showed mixed results. Both models achieved 15/15 on black-and-white image restoration. However, in the colorization test (20 points possible), ChatGPT Images 2.0 scored 19 while Nano Banana dropped to 10 points.
Text Rendering Remains Weak Point for Gemini
Nano Banana's most significant failures occurred with text generation. When restoring a 1970s New Jersey emergency vehicle photo, Nano Banana:
- Misrendered "RADIOLOGICAL DEFENSE" as "FOIN LENN - C.OD"
- Fabricated door text crediting New York instead of New Jersey
- Invented a brass hose fitting not present in the original image
ChatGPT Images 2.0 correctly placed "RADIOLOGICAL DEFENSE" on the vehicle's side but misspelled it as "DEFNSE" on the back, resulting in a single point deduction.
Both models achieved perfect scores (15/15) on logo creation, correctly rendering "Space Coast Studios" text. According to the testing protocol, both also scored 15/15 on a fantasy librarian scene generation test.
Context and Capability Claims
OpenAI claims ChatGPT Images 2.0 "goes beyond basic image generation" with abilities to include text and context derived from real data. The model was released simultaneously with GPT-5.5, described as a "better-and-faster spec bump" from GPT-5.4.
The previous December 2025 benchmark showed Nano Banana at 93% compared to ChatGPT's 74%, with ChatGPT's poor performance attributed to refusals on pop-culture test prompts.
Privacy Concern Noted
The article mentions an unspecified "freaky and uncool" result in the final test involving "Gemini's personalization surprise" that "raised privacy concerns," though specific details were not provided in the source material.
What This Means
ChatGPT Images 2.0's 23-percentage-point improvement demonstrates substantial progress in OpenAI's image generation capabilities, particularly in text rendering and prompt adherence. Gemini Nano Banana's decline from 93% to 85% suggests either more stringent testing criteria in the updated benchmark or regression in Google's model performance. Text rendering remains a critical differentiator, with Nano Banana's tendency to fabricate details presenting accuracy concerns for production use cases. The 12-point gap represents significant competitive ground for OpenAI in the image generation market.
Related Articles
OpenAI rolls out ChatGPT Lockdown mode to all users to block prompt injection data theft
OpenAI has expanded Lockdown mode to all ChatGPT plan tiers, including Free, Go, Plus, Pro, and Business users. The security feature blocks outbound network requests to prevent prompt injection attacks from stealing sensitive data, but disables live web browsing, Deep Research, and Agent mode.
OpenAI's ChatGPT Memory V3 now profiles users across all conversations, raises accuracy and privacy concerns
OpenAI has deployed Dreaming V3, a background memory synthesis system that builds comprehensive user profiles from chat history. The company reports factual task recall jumped from 41% in 2024 to 82% in 2026, while reducing compute costs by 5X. However, testing reveals the system stores outdated and incorrect information that persists even when users disable memory features.
OpenAI plans ChatGPT redesign to integrate coding tools, image generation, and third-party apps
OpenAI will roll out a redesigned ChatGPT interface in the coming weeks that integrates coding tools, image generation capabilities, and third-party applications from partners including Canva and Booking.com. The overhaul, first reported by The Financial Times, aims to shift users from simple chat interactions to multi-task workflows, particularly targeting enterprise customers.
OpenAI launches Lockdown Mode to block prompt injection data exfiltration attacks
OpenAI has released Lockdown Mode, an optional security setting that protects against prompt injection attacks by limiting network requests and image fetching in ChatGPT. The feature is designed for users handling sensitive data and disables some ChatGPT capabilities including Deep Research and Agent Mode.
Comments
Loading...