benchmarkOpenAI

ChatGPT Images 2.0 scores 97% in head-to-head image generation benchmark against Google's Gemini Nano Banana at 85%

TL;DR

OpenAI's ChatGPT Images 2.0 scored 97% versus Google's Gemini Nano Banana at 85% in a nine-test image generation benchmark conducted by ZDNET. The tests measured capabilities including image restoration, text rendering, and prompt adherence, with Nano Banana losing points primarily for fabricating details and text errors.

2 min read
0

ChatGPT Images 2.0 Scores 97% Against Gemini Nano Banana's 85% in Image Generation Tests

OpenAI's ChatGPT Images 2.0 scored 97% in a head-to-head image generation benchmark against Google's Gemini Nano Banana, which scored 85%, according to testing conducted by ZDNET's David Gewirtz.

The nine-test benchmark evaluated both models on image restoration, text rendering, prompt adherence, and creative generation. ChatGPT Images 2.0, released last week alongside GPT-5.5, demonstrated significant improvements over its December 2025 performance of 74%.

Test Results Breakdown

In the admiral photo recontextualization test (15 points possible), ChatGPT Images 2.0 scored 14 while Nano Banana scored 12. Both models generated accurate backgrounds and naval uniforms but made errors in uniform details. Nano Banana lost additional points for altering facial features, including adding a modified beard and "wacky grin" to the test subject.

The restoration tests showed mixed results. Both models achieved 15/15 on black-and-white image restoration. However, in the colorization test (20 points possible), ChatGPT Images 2.0 scored 19 while Nano Banana dropped to 10 points.

Text Rendering Remains Weak Point for Gemini

Nano Banana's most significant failures occurred with text generation. When restoring a 1970s New Jersey emergency vehicle photo, Nano Banana:

  • Misrendered "RADIOLOGICAL DEFENSE" as "FOIN LENN - C.OD"
  • Fabricated door text crediting New York instead of New Jersey
  • Invented a brass hose fitting not present in the original image

ChatGPT Images 2.0 correctly placed "RADIOLOGICAL DEFENSE" on the vehicle's side but misspelled it as "DEFNSE" on the back, resulting in a single point deduction.

Both models achieved perfect scores (15/15) on logo creation, correctly rendering "Space Coast Studios" text. According to the testing protocol, both also scored 15/15 on a fantasy librarian scene generation test.

Context and Capability Claims

OpenAI claims ChatGPT Images 2.0 "goes beyond basic image generation" with abilities to include text and context derived from real data. The model was released simultaneously with GPT-5.5, described as a "better-and-faster spec bump" from GPT-5.4.

The previous December 2025 benchmark showed Nano Banana at 93% compared to ChatGPT's 74%, with ChatGPT's poor performance attributed to refusals on pop-culture test prompts.

Privacy Concern Noted

The article mentions an unspecified "freaky and uncool" result in the final test involving "Gemini's personalization surprise" that "raised privacy concerns," though specific details were not provided in the source material.

What This Means

ChatGPT Images 2.0's 23-percentage-point improvement demonstrates substantial progress in OpenAI's image generation capabilities, particularly in text rendering and prompt adherence. Gemini Nano Banana's decline from 93% to 85% suggests either more stringent testing criteria in the updated benchmark or regression in Google's model performance. Text rendering remains a critical differentiator, with Nano Banana's tendency to fabricate details presenting accuracy concerns for production use cases. The 12-point gap represents significant competitive ground for OpenAI in the image generation market.

Related Articles

model release

OpenAI releases ChatGPT Images 2.0 with 3840x2160 resolution at $30 per 1M output tokens

OpenAI released ChatGPT Images 2.0, pricing output tokens at $30 per million with maximum resolution of 3840x2160 pixels. CEO Sam Altman claims the improvement from gpt-image-1 to gpt-image-2 equals the jump from GPT-3 to GPT-5.

model release

OpenAI GPT-5.5 scores 93/100 in benchmark test, loses points for ignoring instructions

OpenAI's GPT-5.5 scored 93 out of 100 points in a 10-round benchmark test covering summarization, reasoning, coding, and creative tasks. The model lost points primarily for ignoring specific instructions, such as using unauthorized sources when asked to summarize from a single news outlet.

model release

OpenAI Releases GPT-5.5 Pro with 1M+ Token Context Window, $30 Per Million Input Tokens

OpenAI has released GPT-5.5 Pro, a high-capability model with a 1,050,000 token context window (922K input, 128K output) priced at $30 per million input tokens and $180 per million output tokens. The model supports text and image inputs and is optimized for deep reasoning, agentic coding, and multi-step workflows.

product update

OpenAI releases ChatGPT Images 2.0 with accurate text rendering and brand-style matching

OpenAI launched ChatGPT Images 2.0, upgrading from decorative images to full-page graphics with detailed text rendering. The update is available to all ChatGPT tiers, with advanced features requiring paid subscriptions that access the Thinking model. Hands-on testing shows significant improvements in text accuracy and brand-style replication, though factual errors still occur.

Comments

Loading...