benchmark

Gemini handles video analysis across YouTube and 1.65GB local files, Claude fails entirely

TL;DR

In direct testing, Google's Gemini successfully analyzed video content from YouTube links and local files up to 1.65GB, accurately understanding context without audio or metadata. Anthropic's Claude cannot process video at all, while OpenAI's ChatGPT faces a 500MB file size limit without Codex assistance.

3 min read
0

Gemini handles video analysis across YouTube and 1.65GB local files, Claude fails entirely

Google's Gemini can process video content directly through its web interface, handling YouTube URLs, MP4 files up to 625MB, and MOV files up to 1.65GB, according to testing by ZDNET. Anthropic's Claude cannot process video in any format, while OpenAI's ChatGPT is limited to files under 500MB without using its Codex tool.

The tests used three video types: a YouTube video about annealing, a silent MP4 drone control demonstration, and a 1.65GB MOV file of a walk-and-talk video. Each AI was prompted with "Can you watch this video?" to assess direct video understanding capabilities.

Claude's complete video limitations

Claude explicitly stated across all test cases: "I can't watch video content directly. I don't have the ability to process video or audio content." This applies to both the app and web interface, across YouTube links, MP4, and MOV formats.

The $100-per-month Claude Max plan showed no video processing capability in testing.

Gemini's video processing capabilities

Gemini's web interface processed all video formats without requiring a standalone app. In the most challenging test—a silent drone control video with no audio or visible drone—Gemini accurately identified the testing scenario:

"In the video, you're testing out some hand gestures—raising your palm to the camera as if signaling it to stop or move. The camera follows your lead, changing its angle and distance as you guide it through the yard."

The AI correctly understood the video showed drone gesture control despite the drone being behind the camera and not visible in frame.

For the annealing video, Gemini identified specific sections and verbal points. For the walk-and-talk MOV file, it recognized the location and commentary without YouTube metadata or transcripts.

Gemini's image generation using Nano Banana failed to create accurate thumbnails, generating fictional people instead of using video frames and misspelling text overlays.

ChatGPT requires workarounds

ChatGPT Plus ($20/month) cannot read YouTube links directly and has a 500MB file size limit for direct video processing. Both test files exceeded this limit.

When combined with OpenAI Codex, ChatGPT gained video analysis capabilities. Codex processed both local files and understood their content. For the drone test, Codex reported: "A person stands in a residential backyard and faces the camera/drone. They gesture a few times. The camera viewpoint moves around them over time."

For the MOV file, Codex initially required permission to install Python libraries for audio transcription before processing the video.

Testing methodology

The test compared ChatGPT Plus ($20/month), Gemini Pro ($20/month), and Claude Max ($100/month). The "watch this video" prompt proved more effective than "understand" or "summarize," which caused AIs to search for metadata rather than process video content directly.

What this means

Gemini currently leads in native video understanding across consumer AI assistants, processing files up to 1.65GB through a web interface without additional tools. Claude's complete inability to process video represents a significant capability gap at any price point. ChatGPT's 500MB limit and need for Codex integration creates friction for large file analysis, though the combination delivers comparable understanding to Gemini when properly configured. For users needing video analysis capabilities, Gemini provides the most straightforward solution at $20/month, matching ChatGPT's price while avoiding file size restrictions.

Related Articles

benchmark

ChatGPT Images 2.0 scores 97% in head-to-head image generation benchmark against Google's Gemini Nano Banana at 85%

OpenAI's ChatGPT Images 2.0 scored 97% versus Google's Gemini Nano Banana at 85% in a nine-test image generation benchmark conducted by ZDNET. The tests measured capabilities including image restoration, text rendering, and prompt adherence, with Nano Banana losing points primarily for fabricating details and text errors.

benchmark

Study: 25% of quotes in AI chatbot responses originate from journalism

A Muckrack analysis of 15 million quotes generated by AI systems found that one in four originate from journalistic sources. The study evaluated responses from Gemini, Perplexity, Claude, and ChatGPT, revealing Reuters as the most cited publication globally, with The Guardian leading in the UK.

benchmark

Qwen3.6-35B-A3B Outperforms Claude Opus 4.7 on SVG Generation Test

In an informal SVG generation benchmark, Alibaba's Qwen3.6-35B-A3B model running locally via a 20.9GB quantized version outperformed Anthropic's newly released Claude Opus 4.7. The test, which asked models to generate SVG illustrations of pelicans and flamingos on bicycles, showed the smaller local model producing more accurate bicycle frames and more creative outputs.

benchmark

Google AI Overviews reach 91% accuracy with Gemini 3, but 56% of answers lack verifiable sources

An independent study by AI startup Oumi found that Google's AI Overviews answered correctly 91% of the time with Gemini 3, up from 85% with Gemini 2, based on 4,326 searches using the SimpleQA benchmark. However, 56% of correct answers in Gemini 3 could not be verified through the linked sources—a significant increase from 37% in Gemini 2—and at Google's scale, a 9% error rate still translates to millions of wrong answers per hour.

Comments

Loading...