model-release

50 articles tagged with model-release

May 8, 2026
model releaseTencent

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

May 7, 2026
model release

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

April 29, 2026
model releaseMistral AI

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.

April 27, 2026
model releaseOpenAI

OpenAI Launches GPT Mini Latest with 400,000 Token Context Window

OpenAI released GPT Mini Latest on April 27, 2025, featuring a 400,000 token context window. The model automatically redirects to the latest version in the OpenAI GPT Mini family, allowing developers to stay current without manual updates.

model release

Moonshot AI Launches 'Kimi Latest' Router Model with 262K Context Window

Moonshot AI released Kimi Latest, a router endpoint that automatically redirects to the most recent model in the Kimi family. The model features a 262,144 token context window, though specific pricing and performance benchmarks have not been disclosed.

model release

Alibaba Qwen Releases Qwen3.6 Flash with 1M Context Window at $0.25 per 1M Input Tokens

Alibaba's Qwen team has released Qwen3.6 Flash, a multimodal language model supporting text, image, and video input with a 1 million token context window. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with tiered pricing above 256K tokens.

April 24, 2026
model releaseDeepSeek

DeepSeek V4 Pro launches with 1.6 trillion parameters, 1M token context at $0.145 per million input tokens

Chinese AI lab DeepSeek has released preview versions of DeepSeek V4 Flash and V4 Pro, mixture-of-experts models with 1 million token context windows. The V4 Pro has 1.6 trillion total parameters (49 billion active), making it the largest open-weight model available, while both models significantly undercut frontier model pricing.

model releaseDeepSeek

DeepSeek releases V4 preview, claims parity with GPT-4o and Claude 3.5 Sonnet

DeepSeek released a preview of its V4 model on April 24, 2026, claiming the open-source system matches leading closed-source models from Anthropic, Google, and OpenAI. The company emphasized improved coding capabilities and compatibility with domestic Huawei chips, but did not disclose training costs or hardware specifications.

model releaseDeepSeek

DeepSeek Releases V4-Flash-Base: 292B Parameter Base Model

DeepSeek has released V4-Flash-Base, a 292 billion parameter base model now available on Hugging Face. The model uses BF16, I64, F32, and F8_E4M3 tensor types and is distributed in Safetensors format.

model releaseDeepSeek

DeepSeek V4 Pro launches with 1.6T parameters at $1.74/M tokens, undercutting Claude Sonnet 4.6 by 42%

DeepSeek released two preview models: V4 Pro (1.6T total parameters, 49B active) and V4 Flash (284B total, 13B active), both with 1 million token context windows. V4 Pro is priced at $1.74/M input tokens and $3.48/M output—42% cheaper than Claude Sonnet 4.6—while V4 Flash at $0.14/$0.28 per million tokens undercuts all small frontier models.

model releaseDeepSeek

DeepSeek releases V4 model preview with agent optimization, pricing undisclosed

DeepSeek released a preview of its V4 large language model on April 24, 2026, available in 'pro' and 'flash' versions. The Hangzhou-based company claims the open-source model achieves strong performance on agent-based tasks and has been optimized for tools like Anthropic's Claude Code and OpenClaw.

April 22, 2026
model releaseXiaomi+1

Xiaomi Launches MiMo-V2.5 With 1M Context Window at $0.40 per Million Input Tokens

Xiaomi released MiMo-V2.5 on April 22, 2026, a native omnimodal model with a 1,048,576 token context window. The model is priced at $0.40 per million input tokens and $2 per million output tokens, positioning it as a cost-efficient alternative for agentic applications requiring multimodal perception across image and video understanding.

model release

Alibaba Qwen Releases 27B Parameter Model with 262K Context Window, Claims 77.2% on SWE-bench Verified

Alibaba Qwen released Qwen3.6-27B, a 27-billion parameter model with a 262,144 token context window extensible to 1,010,000 tokens. The model claims 77.2% on SWE-bench Verified and 53.5% on SWE-bench Pro, with open weights available on Hugging Face.

April 16, 2026
model releaseAnthropic

Anthropic releases Claude Opus 4.7 with 1M context window for long-running agent tasks

Anthropic has released Claude Opus 4.7, the latest version of its flagship Opus family designed for long-running, asynchronous agent tasks. The model features a 1 million token context window and costs $5 per million input tokens and $25 per million output tokens.

model releaseAnthropic+1

Anthropic releases Claude Opus 4.7 with reduced cyber capabilities compared to Mythos Preview

Anthropic released Claude Opus 4.7, a new model that the company says is 'broadly less capable' than its most powerful offering, Claude Mythos Preview. The model includes automated safeguards that detect and block prohibited or high-risk cybersecurity requests.

April 14, 2026
model releaseOpenAI

OpenAI releases GPT-5.4-Cyber, a fine-tuned variant for defensive cybersecurity work

OpenAI has released GPT-5.4-Cyber, a variant of GPT-5.4 fine-tuned specifically for defensive cybersecurity use cases. The release accompanies the company's Trusted Access for Cyber program, which allows users to verify their identity via government ID to gain access to cybersecurity-focused tools.

model releaseOpenAI

OpenAI releases GPT-5.4-Cyber, a fine-tuned model for vetted cybersecurity defenders with binary reverse engineering cap

OpenAI announced GPT-5.4-Cyber, a variant of GPT-5.4 fine-tuned for defensive cybersecurity work. The model features binary reverse engineering capabilities and reduced safety restrictions, but access is limited to authenticated security professionals through the company's Trusted Access for Cyber program.

April 12, 2026
model releaseAnthropic

Anthropic launches Mythos AI model claiming zero-day vulnerability discovery capabilities

Anthropic has launched Mythos, an AI model the company claims can identify and exploit zero-day vulnerabilities with significant capability. The model has not been released publicly, with Anthropic citing security concerns. The announcement raises questions about the model's actual capabilities versus pre-IPO positioning.

model releaseArcee Ai

Arcee AI releases Trinity-Large-Thinking, open reasoning model matching Claude Opus on agent tasks

Arcee AI has released Trinity-Large-Thinking, a 400-billion-parameter open-weight reasoning model with a mixture-of-experts architecture that activates only 13 billion parameters per token. The model matches Claude Opus 4.6 on agent benchmarks like Tau2 and PinchBench but lags on general reasoning tasks. The company spent approximately $20 million—roughly half its total venture capital—to train the model on 2,048 Nvidia B300 GPUs over 33 days.

model release

MiniMax releases M2.7, a 229B parameter model with self-evolving capabilities and agent teams

MiniMax has released MiniMax-M2.7, a 229-billion parameter model that uniquely participates in its own evolution during development. The model achieves 66.6% medal rate on MLE Bench Lite and 56.22% on SWE-Pro benchmarks, with native support for multi-agent collaboration and complex tool orchestration.

April 9, 2026
model release

Meta AI app jumps to No. 5 on App Store following Muse Spark launch

Meta's AI app surged from No. 57 to No. 5 on the U.S. App Store within 24 hours of launching Muse Spark, Meta's new multimodal AI model. The model accepts voice, text, and image inputs and features reasoning capabilities for science and math tasks, visual coding, and multi-agent functionality.

model releaseAnthropic

Anthropic limits Mythos release to enterprises, citing security risks and blocking distillation

Anthropic announced it is limiting Mythos, its newest model, to large enterprises and critical infrastructure operators rather than releasing it publicly, claiming the model's ability to discover software security exploits poses risks. The restricted rollout strategy mirrors planned approaches by OpenAI and may serve dual purposes: managing security concerns while preventing smaller competitors from using distillation techniques to replicate frontier model capabilities.

model releaseZhipu AI

GLM-5.1 released: 754B agentic model outperforms Claude on coding benchmarks

Zhipu AI released GLM-5.1, a 754-parameter model optimized for agentic engineering tasks. The model scores 58.4% on SWE-Bench Pro, outperforming Claude 3.5 Sonnet (57.3%), and demonstrates sustained reasoning capability over hundreds of iterations.

model releaseZhipu AI

Zhipu AI's GLM-5.1 outperforms GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro through iterative strategy refinement

Zhipu AI has released GLM-5.1, a freely available open-weight model designed for long-running programming tasks that achieves 58.4% on SWE-Bench Pro, edging out GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). The model's core capability is iterative strategy refinement—it rethinks its approach across hundreds of iterations and thousands of tool calls, recognizing dead ends and shifting tactics without human intervention. However, GLM-5.1 trails on reasoning and knowledge benchmarks, scoring 31% on Humanity's Last Exam compared to Gemini 3.1 Pro's 45%.

April 8, 2026
model release

Meta launches Muse Spark model with private API preview and 16 integrated tools

Meta announced Muse Spark today, its first model release since Llama 4 a year ago. The hosted model is available in private API preview and on meta.ai with Instant and Thinking modes, benchmarking competitively against Anthropic's Opus 4.6 and Google's Gemini 3.1 Pro, though behind on Terminal-Bench 2.0.

model release

Meta launches Muse Spark, its first model from revamped AI labs

Meta Superintelligence Labs has launched Muse Spark, its first model since Mark Zuckerberg restructured the company's AI division. The multimodal model now powers Meta AI's app and website in the US, with rollout planned for WhatsApp, Instagram, Facebook, Messenger, and Meta's smart glasses in coming weeks.

model releaseArcee Ai

Arcee AI releases Trinity-Large-Thinking: 398B sparse MoE model with chain-of-thought reasoning

Arcee AI released Trinity-Large-Thinking, a 398B-parameter sparse Mixture-of-Experts model with approximately 13B active parameters per token, post-trained with extended chain-of-thought reasoning for agentic workflows. The model achieves 94.7% on τ²-Bench, 91.9% on PinchBench, and 98.2% on LiveCodeBench, generating explicit reasoning traces in <think>...</think> blocks before producing responses.

model release

Alibaba's Qwen3.6 Plus reaches 78.8 on SWE-bench with 1M context window

Alibaba released Qwen3.6 Plus on April 2, 2026, featuring a 1 million token context window at $0.50 per million input tokens and $3 per million output tokens. The model combines linear attention with sparse mixture-of-experts routing to achieve a 78.8 score on SWE-bench Verified, with significant improvements in agentic coding, front-end development, and reasoning tasks.

April 7, 2026
model release

Z.ai releases GLM-5.1, 754B parameter open-weight model with improved code generation

Z.ai has released GLM-5.1, a 754-billion parameter open-weight model matching the size of its predecessor GLM-5. The model demonstrates improved ability to generate complex, multi-part outputs like HTML pages with SVG graphics and CSS animations, available via Hugging Face and OpenRouter.

model releaseAnthropic

Anthropic unveils Claude Mythos model, finds thousands of OS vulnerabilities via Project Glasswing

Anthropic has unveiled Claude Mythos, a new AI model designed for cybersecurity that has already discovered thousands of high-severity vulnerabilities in every major operating system and web browser. The model is being distributed as a preview to over 40 organizations and major technology partners including Apple, Google, Microsoft, and Amazon Web Services through Project Glasswing, a coordinated cybersecurity initiative.

model releaseAnthropic

Anthropic withholds Mythos Preview model due to advanced hacking capabilities

Anthropic is rolling out its Mythos Preview model only to a handpicked group of 40 tech and cybersecurity companies, withholding public release due to the model's sophisticated ability to find tens of thousands of vulnerabilities and autonomously create working exploits. The model found bugs in every major operating system and web browser during testing, including vulnerabilities decades old and undetected by human security researchers.

model release

GLM-5.1 achieves 58.4% on SWE-Bench Pro with sustained agentic reasoning over hundreds of iterations

Zhipu AI has released GLM-5.1, a 754-billion parameter model designed for agentic engineering with significantly improved coding capabilities over its predecessor. The model achieves 58.4% on SWE-Bench Pro and demonstrates sustained performance improvement over hundreds of tool calls and iterations, unlike earlier models that plateau quickly.

model release

Z.ai releases GLM-5.1 with 202K context window and 8-hour autonomous task capability

Z.ai has released GLM-5.1, a model with a 202,752 token context window and significantly improved coding capabilities. The model claims the ability to work autonomously on single tasks for over 8 hours, handling long-horizon projects with continuous planning and execution.

April 4, 2026
analysis

Tencent releases HY-OmniWeaving multimodal model as Gemma-4 variants emerge

Tencent has released HY-OmniWeaving, a new multimodal model available on Hugging Face. Concurrently, NVIDIA and Unsloth have published optimized variants of Gemma-4, including a 31B instruction-tuned version and quantized GGUF format.

April 3, 2026
model releaseDeepSeek+1

Deepseek v4 launching on Huawei chips exclusively, signaling China's AI independence progress

Deepseek v4 is launching in the coming weeks running exclusively on Huawei chips, marking a major milestone in China's effort to reduce dependency on foreign semiconductors. Chinese tech giants including Alibaba, Bytedance, and Tencent have ordered hundreds of thousands of Huawei Ascend 950PR units to deploy the model through their cloud services.

model releaseGoogle DeepMind

Google DeepMind releases Gemma 4 open models with up to 256K context and multimodal reasoning

Google DeepMind has released Gemma 4, an open-weights model family in four sizes (2.3B to 31B parameters) with multimodal capabilities handling text, images, video, and audio. The 26B A4B variant uses mixture-of-experts to achieve 4B active parameters while supporting 256K token context windows and native reasoning modes.

April 2, 2026
model releaseMicrosoft

Microsoft releases three multimodal AI models to compete with OpenAI and Google

Microsoft AI released three foundational models on April 2: MAI-Transcribe-1 for speech-to-text across 25 languages, MAI-Voice-1 for audio generation, and MAI-Image-2 for video generation. The company positions these models as cheaper alternatives to Google and OpenAI offerings. Models are available on Microsoft Foundry with pricing starting at $0.36 per hour for transcription.

model releaseMicrosoft

Microsoft's MAI-Transcribe-1 achieves lowest word error rate on FLEURS, costs $0.36/audio hour

Microsoft has released MAI-Transcribe-1, a speech-to-text model that achieves the lowest word error rate on the FLEURS benchmark across 25 languages, outperforming Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite. The model runs 2.5 times faster than Microsoft's previous Azure Fast offering and costs $0.36 per audio hour.

model release

Alibaba releases Qwen 3.6 Plus with 1M context window, free tier now available

Alibaba's Qwen division released Qwen 3.6 Plus on April 2, 2026, offering free access to a model with a 1,000,000 token context window. The model combines linear attention with sparse mixture-of-experts routing and achieves a 78.8 score on SWE-bench Verified for software engineering tasks.

March 31, 2026
model releasexAI

xAI releases Grok 4.20 Multi-Agent with 2M context window and parallel agent reasoning

xAI has released Grok 4.20 Multi-Agent, a variant designed for collaborative agent-based workflows with a 2-million-token context window. The model scales from 4 agents at low/medium reasoning effort to 16 agents at high/xhigh effort levels, priced at $2 per million input tokens and $6 per million output tokens.

model releasexAI

xAI releases Grok 4.20 with 2M context window and native reasoning capabilities

xAI released Grok 4.20 on March 31, 2026, its flagship model featuring a 2 million token context window, $2 per million input tokens and $6 per million output tokens pricing, and toggleable reasoning capabilities. The model includes web search functionality at $5 per 1,000 queries and claims industry-leading speed with low hallucination rates.

model release

Google launches Veo 3.1 Lite, cutting video generation costs by half

Google announced Veo 3.1 Lite, a cost-reduced video generation model priced at less than 50% of Veo 3.1 Fast's cost. The model supports text-to-video and image-to-video generation at 720p or 1080p resolution with customizable durations of 4s, 6s, or 8s, rolling out today on the Gemini API and Google AI Studio.

March 30, 2026
model release

Google releases Lyria 3 Clip Preview for music generation via API

Google has released Lyria 3 Clip Preview, a music generation model available through the Gemini API as of March 30, 2026. The model generates 30-second audio clips from text prompts or images at $0.04 per clip, with a 1,048,576 token context window.

model release

Google releases Lyria 3 Pro Preview for full-length music generation

Google has released Lyria 3 Pro Preview, a music generation model capable of producing full-length songs with verses, choruses, bridges, vocals, and timed lyrics from text prompts or images. The model features a 1,048,576 token context window and charges $0.08 per generated song through the Gemini API.

March 28, 2026
model releaseNVIDIA

NVIDIA releases gpt-oss-puzzle-88B, 88B-parameter reasoning model with 1.63× throughput gains

NVIDIA released gpt-oss-puzzle-88B on March 26, 2026, a 88-billion parameter mixture-of-experts model optimized for inference efficiency on H100 hardware. Built using the Puzzle post-training neural architecture search framework, the model achieves 1.63× throughput improvement in long-context (64K/64K) scenarios and up to 2.82× improvement on single H100 GPUs compared to its parent gpt-oss-120B, while matching or exceeding accuracy across reasoning effort levels.

March 27, 2026
model releaseCohere

Cohere releases 2B open-source speech model with 5.42% word error rate

Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition model that the company claims tops the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The model supports 14 languages and is available under Apache 2.0 license, outperforming OpenAI's Whisper Large v3 and competing models on both accuracy and throughput metrics.

model releaseAnthropic+1

Anthropic confirms leaked model represents major reasoning advance after security breach

A data breach at Anthropic exposed internal documents detailing an unreleased AI model the company describes as its most powerful to date. Anthropic confirmed it is already testing the model with select customers, claiming significant advances in reasoning, coding, and cybersecurity. The breach resulted from a misconfiguration in Anthropic's content management system that automatically made ~3,000 uploaded files publicly accessible.

March 26, 2026
model release

Gemini 3.1 Flash Live scores 95.9% on Big Bench Audio, Google's fastest voice model

Google has released Gemini 3.1 Flash Live, its new voice and audio AI model, scoring 95.9% on the Big Bench Audio Benchmark at high thinking levels—second only to Step-Audio R1.1 Realtime at 97.0%. Response times range from 0.96 seconds at minimal thinking to 2.98 seconds at high thinking, with pricing held at $0.35 per hour of audio input and $1.40 per hour of audio output.

model releaseMistral AI

Mistral releases Voxtral-4B-TTS-2603, open-weights text-to-speech model for production voice agents

Mistral AI released Voxtral-4B-TTS-2603, an open-weights text-to-speech model designed for production voice agents. The 4B-parameter model supports 9 languages, 20 preset voices, achieves 70ms latency at concurrency 1 on a single NVIDIA H200, and requires only 16GB GPU memory.

model release

Google releases Gemini 3.1 Flash Live, claims improved audio recognition and lower latency for voice conversations

Google announced Gemini 3.1 Flash Live as its updated audio and voice model for Gemini Live and Search Live. The model claims improved acoustic recognition, better background noise filtering, support for over 90 languages, and lower latency compared to 2.5 Flash Native Audio.