Best Cheap LLM in 2026

Active AI models ranked by input price per million tokens. Prices from official provider pages.

Updated automatically as pricing changes. Full model database →

#	Model	Input /1M	Output /1M	Context
1	Qwen 3.6 Plus Preview Alibaba / Qwen	free	free	1M
2	Elephant Alpha Openrouter	free	free	262K
3	Google Lyria 3 Clip Preview Google DeepMind	free	free	1M
4	Gemma 4 26B A4B IT Google DeepMind	free	free	262K
5	Llama 3.2 3B Instruct Meta AI	free	free	131K
6	Trinity Large Preview Arcee Ai	free	free	131K
7	Llama Guard 4 12B Meta AI	free	free	164K
8	Google Lyria 3 Pro Preview Google DeepMind	free	free	1M
9	Qwen 3.6 Plus Alibaba / Qwen	free	free	1M
10	Gemma 4 31B Instruct Google DeepMind	free	free	262K
11	Gemma 4 E2B Google DeepMind	free	free	—
12	Ling-2.6-flash Inclusionai	free	free	262K
13	Qwen-Turbo Alibaba / Qwen	$0.033	$0.13	131K
14	Command R7B (12-2024)Cohere	$0.037	$0.15	128K
15	Mistral Small 3 Mistral AI	$0.05	$0.08	33K
16	Qwen3.5-Flash Alibaba / Qwen	$0.065	$0.26	1M
17	ERNIE 4.5 21B A3B Thinking Baidu AI	$0.07	$0.28	131K
18	Phi-4-mini Microsoft	$0.07	$0.07	128K
19	ERNIE 4.5 21B A3B Baidu AI	$0.07	$0.28	120K
20	ERNIE 4.5 VL 28B A3B Baidu AI	$0.07	$0.28	128K
21	Gemini 2.0 Flash-Lite Google DeepMind	$0.075	$0.3	1M
22	Gemini 2.0 Flash Google DeepMind	$0.1	$0.4	1M
23	Mistral Small 3.1 Mistral AI	$0.1	$0.3	128K
24	Seed-2.0-Mini ByteDance	$0.1	$0.4	262K
25	Nemotron 3 Super NVIDIA	$0.1	$0.5	1M
26	Mistral Small 3.1 24B Instruct Mistral AI	$0.1	$0.3	128K
27	Step-3.5-Flash StepFun	$0.1	$0.3	262K
28	Step-3.5-Flash-Base StepFun	$0.1	$0.3	262K
29	GPT-4.1 nano OpenAI	$0.1	$0.4	1M
30	Mistral Pixtral 12B Mistral AI	$0.1	$0.1	33K

Finding the best value LLM

Price alone doesn't tell the whole story. A model that costs twice as much but solves problems in half the calls is actually cheaper. When evaluating cost, consider:

Input vs output pricing — for chat and RAG, input is usually 80%+ of your tokens. For generation-heavy tasks (writing, summarization), output price matters more.
Context window — larger contexts let you process more in a single call, reducing round trips and total token usage.
Open-weight models — if you can self-host, models like Mistral and LLaMA can cost near zero at scale. Check the “open weights” column on the model database.
Quality vs cost — compare benchmark scores on the benchmark leaderboard to find the best performance per dollar.

Also see: Best Coding LLM, Best Reasoning LLM, Compare any two models.