OpenAI releases GPT-4o mini with 128K context at $0.15/$0.60 per 1M tokens
OpenAI released GPT-4o mini on July 18, 2024, a compact multimodal model with 128,000 token context window priced at $0.15 per million input tokens and $0.60 per million output tokens. The model achieves 82% on MMLU and claims to rank higher than GPT-4 on chat preference leaderboards while costing 60% less than GPT-3.5 Turbo.
GPT-4o mini — Quick Specs
OpenAI Releases GPT-4o mini with 128K Context and Aggressive Pricing
OpenAI introduced GPT-4o mini on July 18, 2024, positioning it as the company's most capable small model and direct successor to GPT-3.5 Turbo. The model arrives with significant cost reduction and expanded context handling.
Model Specifications
GPT-4o mini supports multimodal inputs, accepting both text and images while producing text outputs. The model features a 128,000 token context window—a 4x increase over GPT-3.5 Turbo's 32K limit.
Pricing starts at $0.15 per million input tokens and $0.60 per million output tokens. OpenAI claims this represents a 60% cost reduction compared to GPT-3.5 Turbo, making it significantly cheaper than other recent frontier models.
Performance Claims
GPT-4o mini achieves 82% on MMLU, OpenAI's benchmark of choice for measuring broad knowledge. According to the company, the model "presently ranks higher than GPT-4 on chat preferences common leaderboards," though specific leaderboard names and methodologies are not detailed in the launch materials.
OpenAI characterizes GPT-4o mini as maintaining "SOTA intelligence"—state-of-the-art reasoning—while delivering dramatic cost efficiency gains. The model represents a clear positioning strategy: maintain competitive performance on standard benchmarks while underpricing alternatives in the small-to-medium model category.
Market Context
GPT-4o mini arrives as major AI labs compete for developer adoption through aggressive pricing. The model sits between ultra-lightweight models (like GPT-3.5 Turbo) and OpenAI's flagship offerings, addressing the significant market segment where cost sensitivity and capability requirements intersect.
By July 2024, this pricing tier had become increasingly crowded. The aggressive unit economics suggest OpenAI prioritizes market share and API adoption over near-term margin optimization in this segment.
Deployment Status
GPT-4o mini is available through OpenAI's API and multiple third-party providers including OpenRouter, which routes requests across multiple backends for redundancy.
What This Means
GPT-4o mini signals OpenAI's confidence in its ability to scale multimodal models efficiently while maintaining performance parity with flagship systems. The 128K context window and aggressive 60% cost reduction versus GPT-3.5 Turbo create a compelling value proposition for production applications where both capability and cost matter. The MMLU benchmark alone (82%) does not definitively prove superiority over competitors' models at similar price points—additional benchmarks like HumanEval, GPQA, or math-specific tests would provide clearer differentiation. The claim that it "ranks higher than GPT-4 on chat preferences" requires scrutiny regarding methodology and whether those preference benchmarks correlate with real-world application quality.
Related Articles
OpenAI releases GPT-5.4 mini and nano with 3-4x price increases but major performance gains
OpenAI has released GPT-5.4 mini and GPT-5.4 nano, compact models optimized for coding and subagent tasks. The new models deliver significant performance improvements—GPT-5.4 mini reaches 54.4% on SWE-Bench Pro versus 45.7% for GPT-5 mini—but cost 3-4x more per input token than their predecessors.
OpenAI's GPT-5.4 mini now available in GitHub Copilot
OpenAI has released GPT-5.4 mini, the lightweight variant of its agentic coding model GPT-5.4, in GitHub Copilot. The model represents OpenAI's highest-performing mini offering to date for code generation and completion tasks.
Google's Gemini Embedding 2 unifies text, image, video, and audio in single vector space
Google has released Gemini Embedding 2, its first native multimodal embedding model that represents text, images, video, audio, and documents in a unified vector space. The model eliminates the need for separate embedding models across different modalities in AI pipelines.
OpenAI plans to integrate Sora video generator directly into ChatGPT
OpenAI plans to integrate its Sora video generator as a built-in feature within ChatGPT, according to The Information. Currently available only on a standalone website and app, the integration would let users generate videos directly in the chatbot, similar to how image generation was added last year.