benchmarkOpenAI

ChatGPT Images 2.0 scores 97% in head-to-head image generation benchmark against Google's Gemini Nano Banana at 85%

TL;DR

OpenAI's ChatGPT Images 2.0 scored 97% versus Google's Gemini Nano Banana at 85% in a nine-test image generation benchmark conducted by ZDNET. The tests measured capabilities including image restoration, text rendering, and prompt adherence, with Nano Banana losing points primarily for fabricating details and text errors.

2 min read
0

ChatGPT Images 2.0 Scores 97% Against Gemini Nano Banana's 85% in Image Generation Tests

OpenAI's ChatGPT Images 2.0 scored 97% in a head-to-head image generation benchmark against Google's Gemini Nano Banana, which scored 85%, according to testing conducted by ZDNET's David Gewirtz.

The nine-test benchmark evaluated both models on image restoration, text rendering, prompt adherence, and creative generation. ChatGPT Images 2.0, released last week alongside GPT-5.5, demonstrated significant improvements over its December 2025 performance of 74%.

Test Results Breakdown

In the admiral photo recontextualization test (15 points possible), ChatGPT Images 2.0 scored 14 while Nano Banana scored 12. Both models generated accurate backgrounds and naval uniforms but made errors in uniform details. Nano Banana lost additional points for altering facial features, including adding a modified beard and "wacky grin" to the test subject.

The restoration tests showed mixed results. Both models achieved 15/15 on black-and-white image restoration. However, in the colorization test (20 points possible), ChatGPT Images 2.0 scored 19 while Nano Banana dropped to 10 points.

Text Rendering Remains Weak Point for Gemini

Nano Banana's most significant failures occurred with text generation. When restoring a 1970s New Jersey emergency vehicle photo, Nano Banana:

  • Misrendered "RADIOLOGICAL DEFENSE" as "FOIN LENN - C.OD"
  • Fabricated door text crediting New York instead of New Jersey
  • Invented a brass hose fitting not present in the original image

ChatGPT Images 2.0 correctly placed "RADIOLOGICAL DEFENSE" on the vehicle's side but misspelled it as "DEFNSE" on the back, resulting in a single point deduction.

Both models achieved perfect scores (15/15) on logo creation, correctly rendering "Space Coast Studios" text. According to the testing protocol, both also scored 15/15 on a fantasy librarian scene generation test.

Context and Capability Claims

OpenAI claims ChatGPT Images 2.0 "goes beyond basic image generation" with abilities to include text and context derived from real data. The model was released simultaneously with GPT-5.5, described as a "better-and-faster spec bump" from GPT-5.4.

The previous December 2025 benchmark showed Nano Banana at 93% compared to ChatGPT's 74%, with ChatGPT's poor performance attributed to refusals on pop-culture test prompts.

Privacy Concern Noted

The article mentions an unspecified "freaky and uncool" result in the final test involving "Gemini's personalization surprise" that "raised privacy concerns," though specific details were not provided in the source material.

What This Means

ChatGPT Images 2.0's 23-percentage-point improvement demonstrates substantial progress in OpenAI's image generation capabilities, particularly in text rendering and prompt adherence. Gemini Nano Banana's decline from 93% to 85% suggests either more stringent testing criteria in the updated benchmark or regression in Google's model performance. Text rendering remains a critical differentiator, with Nano Banana's tendency to fabricate details presenting accuracy concerns for production use cases. The 12-point gap represents significant competitive ground for OpenAI in the image generation market.

Related Articles

product update

OpenAI rolls out ChatGPT Lockdown mode to all users to block prompt injection data theft

OpenAI has expanded Lockdown mode to all ChatGPT plan tiers, including Free, Go, Plus, Pro, and Business users. The security feature blocks outbound network requests to prevent prompt injection attacks from stealing sensitive data, but disables live web browsing, Deep Research, and Agent mode.

product update

OpenAI's ChatGPT Memory V3 now profiles users across all conversations, raises accuracy and privacy concerns

OpenAI has deployed Dreaming V3, a background memory synthesis system that builds comprehensive user profiles from chat history. The company reports factual task recall jumped from 41% in 2024 to 82% in 2026, while reducing compute costs by 5X. However, testing reveals the system stores outdated and incorrect information that persists even when users disable memory features.

product update

OpenAI plans ChatGPT redesign to integrate coding tools, image generation, and third-party apps

OpenAI will roll out a redesigned ChatGPT interface in the coming weeks that integrates coding tools, image generation capabilities, and third-party applications from partners including Canva and Booking.com. The overhaul, first reported by The Financial Times, aims to shift users from simple chat interactions to multi-task workflows, particularly targeting enterprise customers.

product update

OpenAI launches Lockdown Mode to block prompt injection data exfiltration attacks

OpenAI has released Lockdown Mode, an optional security setting that protects against prompt injection attacks by limiting network requests and image fetching in ChatGPT. The feature is designed for users handling sensitive data and disables some ChatGPT capabilities including Deep Research and Agent Mode.

Comments

Loading...