model releaseOpenAI

OpenAI releases GPT-4o mini with 128K context at $0.15/$0.60 per 1M tokens

TL;DR

OpenAI released GPT-4o mini on July 18, 2024, a compact multimodal model with 128,000 token context window priced at $0.15 per million input tokens and $0.60 per million output tokens. The model achieves 82% on MMLU and claims to rank higher than GPT-4 on chat preference leaderboards while costing 60% less than GPT-3.5 Turbo.

March 18, 2026 · 4:10 PM2 min read

GPT-4o mini — Quick Specs

Context window128K tokens

Input$0.15/1M tokens

Output$0.6/1M tokens

Compare GPT-4o mini with other models →

OpenAI Releases GPT-4o mini with 128K Context and Aggressive Pricing

OpenAI introduced GPT-4o mini on July 18, 2024, positioning it as the company's most capable small model and direct successor to GPT-3.5 Turbo. The model arrives with significant cost reduction and expanded context handling.

Model Specifications

GPT-4o mini supports multimodal inputs, accepting both text and images while producing text outputs. The model features a 128,000 token context window—a 4x increase over GPT-3.5 Turbo's 32K limit.

Pricing starts at $0.15 per million input tokens and $0.60 per million output tokens. OpenAI claims this represents a 60% cost reduction compared to GPT-3.5 Turbo, making it significantly cheaper than other recent frontier models.

Performance Claims

GPT-4o mini achieves 82% on MMLU, OpenAI's benchmark of choice for measuring broad knowledge. According to the company, the model "presently ranks higher than GPT-4 on chat preferences common leaderboards," though specific leaderboard names and methodologies are not detailed in the launch materials.

OpenAI characterizes GPT-4o mini as maintaining "SOTA intelligence"—state-of-the-art reasoning—while delivering dramatic cost efficiency gains. The model represents a clear positioning strategy: maintain competitive performance on standard benchmarks while underpricing alternatives in the small-to-medium model category.

Market Context

GPT-4o mini arrives as major AI labs compete for developer adoption through aggressive pricing. The model sits between ultra-lightweight models (like GPT-3.5 Turbo) and OpenAI's flagship offerings, addressing the significant market segment where cost sensitivity and capability requirements intersect.

By July 2024, this pricing tier had become increasingly crowded. The aggressive unit economics suggest OpenAI prioritizes market share and API adoption over near-term margin optimization in this segment.

Deployment Status

GPT-4o mini is available through OpenAI's API and multiple third-party providers including OpenRouter, which routes requests across multiple backends for redundancy.

What This Means

GPT-4o mini signals OpenAI's confidence in its ability to scale multimodal models efficiently while maintaining performance parity with flagship systems. The 128K context window and aggressive 60% cost reduction versus GPT-3.5 Turbo create a compelling value proposition for production applications where both capability and cost matter. The MMLU benchmark alone (82%) does not definitively prove superiority over competitors' models at similar price points—additional benchmarks like HumanEval, GPQA, or math-specific tests would provide clearer differentiation. The claim that it "ranks higher than GPT-4 on chat preferences" requires scrutiny regarding methodology and whether those preference benchmarks correlate with real-world application quality.

Source: openrouter.ai ↗

gpt-4o openai model-release multimodal pricing 128k-context small-model july-2024

model releaseApril 29, 2026

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.

model releaseMay 2, 2026

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

product updateMay 2, 2026

OpenAI adds AI-generated pet overlays to Codex coding assistant

OpenAI added optional AI-generated pet companions to its Codex coding assistant. The floating overlays notify developers when Codex completes tasks or needs input, eliminating the need to switch windows to check status.