Best LLM for Coding in 2026

Ranked by composite coding score averaging HumanEval, SWE-bench Verified, and LiveCodeBench. All scores from official model cards and technical reports.

Updated automatically as new models release. Full benchmark leaderboard →

1
93.9%
avg
2
91.2%
avg
3
91.0%
avg
4
89.1%
avg
5
88.4%
avg
6
88.1%
avg
7
85.7%
avg
8
85.4%
avg
9
85.2%
avg
10
85.2%
avg
11
84.9%
avg
12
84.9%
avg
13
84.7%
avg
14
84.0%
avg
15
82.0%
avg
16
81.3%
avg
17
81.0%
avg
18
80.9%
avg
19
80.7%
avg
20
80.5%
avg
21
80.5%
avg
22
80.5%
avg
23
80.0%
avg
24
79.3%
avg
25
79.0%
avg
26
78.9%
avg
27
78.8%
avg
28
78.8%
avg
29
78.4%
avg
30
78.3%
avg
31
78.2%
avg
32
78.0%
avg
33
77.8%
avg
34
77.6%
avg
35
77.4%
avg
36
77.2%
avg
37
77.2%
avg
38
77.1%
avg
39
77.0%
avg
40
76.9%
avg
41
74.8%
avg
42
74.7%
avg
43
74.4%
avg
44
74.1%
avg
45
73.8%
avg
46
73.4%
avg
47
73.4%
avg
48
72.5%
avg
49
72.0%
avg
50
71.9%
avg
51
70.6%
avg
52
69.9%
avg
53
69.8%
avg
54
69.8%
avg
55
69.2%
avg
56
68.2%
avg
57
66.4%
avg
58
65.8%
avg
59
64.7%
avg
60
61.6%
avg
61
60.7%
avg
62
60.5%
avg
63
56.1%
avg
64
44.0%
avg
65
0.7%
avg

How we rank coding models

We average three industry-standard benchmarks:

  • HumanEval — Function-completion coding tasks in Python. Tests whether a model can write correct, working code from a docstring description. Pass@1 accuracy.
  • SWE-bench Verified — Real GitHub issues from popular open-source repos. Tests autonomous software engineering: read the issue, write a fix, pass the test suite. The most practical real-world coding benchmark available.
  • LiveCodeBench — Competitive programming problems from LeetCode, Codeforces, and AtCoder, collected after model training cutoffs to prevent contamination. Harder than HumanEval.

All scores are from official model cards, technical reports, or the HuggingFace Open LLM Leaderboard. Rankings update automatically as new models are released.

Also see: Best Reasoning LLM, Best Cheap LLM, Compare any two models.