model releasexAI

xAI releases Grok 4.3 reasoning model with 1M token context at $1.25/M input tokens

TL;DR

xAI has released Grok 4.3, a reasoning model with a 1 million token context window and no output token limit. The model accepts text and image inputs, has always-on reasoning that cannot be disabled, and uses tiered pricing starting at $1.25 per million input tokens and $2.50 per million output tokens.

2 min read
0

Grok 4.3 — Quick Specs

Context window1000K tokens
Input$1.25/1M tokens
Output$2.5/1M tokens

xAI releases Grok 4.3 reasoning model with 1M token context at $1.25/M input tokens

xAI has released Grok 4.3, a multimodal reasoning model with a 1 million token context window and no output token limit. Released on April 30, 2026, the model is now available through OpenRouter.

Specifications and pricing

Grok 4.3 processes text and image inputs with text output. Input tokens are priced at $1.25 per million tokens, while output tokens cost $2.50 per million tokens. According to xAI, requests exceeding 200,000 total tokens are billed at a higher rate, though the elevated pricing tier has not been disclosed.

The model features always-on reasoning that cannot be disabled or configured by effort level. This distinguishes it from other reasoning models that allow users to adjust computational intensity.

Technical capabilities

xAI positions Grok 4.3 for agentic workflows, instruction-following tasks, and applications requiring high factual accuracy. The absence of an output token limit, combined with the 1 million token context window, enables the model to handle long-document analysis and multi-step agentic tasks without truncation.

The model supports multimodal input, accepting both text and images, but outputs text only.

API access

Grok 4.3 is accessible through OpenRouter's API, which normalizes requests and responses across providers. The platform routes requests to available providers and includes fallback mechanisms for uptime.

OpenRouter's API supports accessing the model's reasoning process through a reasoning_details array in responses. The platform requires preserving complete reasoning details when passing messages back to the model for continued conversations.

What this means

Grok 4.3 enters a competitive reasoning model market where always-on reasoning represents a trade-off: consistent step-by-step thinking for all queries, but no ability to reduce computational cost for simpler tasks. The 1M token context window and unlimited output position it for enterprise document analysis and complex multi-turn interactions. The tiered pricing structure for requests over 200K tokens suggests xAI expects the model to be used for extended contexts, though the lack of disclosed upper-tier pricing creates uncertainty for budget planning at scale.

Related Articles

model release

Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.

model release

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.

model release

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

Apple announced five third-generation foundation models at WWDC26, headlined by AFM 3 Core Advanced—a 20-billion-parameter sparse model that runs on-device by activating only 1-4 billion parameters at a time. For the first time, Apple extended Private Cloud Compute to third-party infrastructure, with AFM 3 Cloud Pro running on Nvidia GPUs in Google Cloud.

model release

Google DeepMind releases DiffusionGemma, a 26B parameter model generating 15-20 tokens per forward pass via discrete dif

Google DeepMind released DiffusionGemma, a 26B parameter mixture-of-experts model that generates text using discrete diffusion instead of autoregression. The model processes blocks of 256 tokens in parallel, achieving generation speeds exceeding 1100 tokens per second on H100 GPUs in low-batch settings.

Comments

Loading...