Best Cheap LLM in 2026

Active AI models ranked by input price per million tokens. Prices from official provider pages.

Updated automatically as pricing changes. Full model database →

#ModelInput /1M
1Gemini 2.0 Flash-LiteGoogle DeepMind$0.075
2GPT-4.1 nanoOpenAI$0.1
3Mistral Small 3.1Mistral AI$0.1
4Gemini 2.0 FlashGoogle DeepMind$0.1
5ERNIE X1Baidu AI$0.14
6Yi-Lightning01.AI$0.14
7Hunyuan-T1Tencent$0.14
8Gemini 2.5 FlashGoogle DeepMind$0.15
9QwQ-32BAlibaba / Qwen$0.15
10GPT-4o miniOpenAI$0.15
11Qwen3 235B-A22BAlibaba / Qwen$0.2
12Claude Haiku 4.5Anthropic$0.25
13MiniMax-M2.5MiniMax$0.3
14Grok 3 minixAI$0.3
15MiniMax-M2MiniMax$0.3
16GPT-4.1 miniOpenAI$0.4
17Qwen3 72BAlibaba / Qwen$0.4
18MiniMax-M1MiniMax$0.4
19Mistral Medium 3Mistral AI$0.4
20Grok 4 minixAI$0.5
21DeepSeek R1 (0528)DeepSeek$0.55
22ERNIE 4.5Baidu AI$0.55
23Kimi K2Moonshot AI$0.6
24Kimi K2.5Moonshot AI$0.6
25o4-miniOpenAI$1.10
26DeepSeek-V3-0324DeepSeek$1.19
27Gemini 2.5 ProGoogle DeepMind$1.25
28Pixtral LargeMistral AI$2.00
29GPT-4.1OpenAI$2.00
30Command R+Cohere$2.50

Finding the best value LLM

Price alone doesn't tell the whole story. A model that costs twice as much but solves problems in half the calls is actually cheaper. When evaluating cost, consider:

  • Input vs output pricing — for chat and RAG, input is usually 80%+ of your tokens. For generation-heavy tasks (writing, summarization), output price matters more.
  • Context window — larger contexts let you process more in a single call, reducing round trips and total token usage.
  • Open-weight models — if you can self-host, models like Mistral and LLaMA can cost near zero at scale. Check the “open weights” column on the model database.
  • Quality vs cost — compare benchmark scores on the benchmark leaderboard to find the best performance per dollar.

Also see: Best Coding LLM, Best Reasoning LLM, Compare any two models.