AI Security Intelligence

Published benchmark scores from peer-reviewed research — 92 results across 3 categories. Plus 31 active bug bounty programs.

Model Security Leaderboard

Autonomous bug patching

SWE-bench Verified score — the industry standard for autonomous code repair. Models are given real GitHub issues with failing tests; score = % resolved with no human help.

Model

ScoreSource

GPT-5.6 Sol OpenAI

96.2%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Fable 5 Anthropic

95.0%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Mythos Preview Anthropic

93.9%

✓ PublishedSWE-bench Verified (official leaderboard)

Kimi K3 Moonshot AI

93.4%

✓ PublishedSWE-bench Verified (official leaderboard)

GPT-5.6 Luna OpenAI

93.0%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Opus 4.8 Anthropic

88.6%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Opus 4.7 Anthropic

87.6%

✓ PublishedSWE-bench Verified (official leaderboard)

Grok 4.5 xAI

86.6%

✓ PublishedSWE-bench Verified (official leaderboard)

Grok 4.20 xAI

84.2%

✓ PublishedSWE-bench Verified (official leaderboard)

GPT-5.4 OpenAI

83.1%

✓ PublishedSWE-bench Verified (official leaderboard)

Grok 4 xAI

81.0%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Claude Opus 4.5 Anthropic

80.9%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Claude Opus 4.6 Anthropic

80.8%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

DeepSeek-V4-Pro DeepSeek

80.6%

✓ PublishedSWE-bench Verified (official leaderboard)

Gemini 3.1 Pro Google DeepMind

80.6%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

MiniMax M3 MiniMax

80.5%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.7 Max Alibaba / Qwen

80.4%

✓ PublishedSWE-bench Verified (official leaderboard)

MiniMax-M2.5 MiniMax

80.2%

✓ PublishedSWE-bench Verified (official leaderboard)

Kimi K2.6 Moonshot AI

80.2%

✓ PublishedSWE-bench Verified (official leaderboard)

GPT-5.2 OpenAI

80.0%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Sonnet 4.6 Anthropic

79.6%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

MiMo-V2.5-Pro Xiaomi

78.9%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen 3.6 Plus Alibaba / Qwen

78.8%

✓ PublishedSWE-bench Verified (official leaderboard)

Gemini 3 Flash Google DeepMind

78.0%

✓ PublishedSWE-bench Verified (official leaderboard)

Hy3 Tencent

78.0%

✓ PublishedSWE-bench Verified (official leaderboard)

MiMo-V2-Pro Xiaomi

78.0%

✓ PublishedSWE-bench Verified (official leaderboard)

GLM 5 Zhipu AI

77.8%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.7 Plus Alibaba / Qwen

77.7%

✓ PublishedSWE-bench Verified (official leaderboard)

Mistral Medium 3.5 Mistral AI

77.6%

✓ PublishedSWE-bench Verified (official leaderboard)

Muse Spark Meta AI

77.4%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.6-27B-FP8 Alibaba / Qwen

77.2%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.6 27B Alibaba / Qwen

77.2%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Sonnet 4.5 Anthropic

77.2%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Kimi K2.5 Moonshot AI

76.8%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.5 397B A17B Alibaba / Qwen

76.4%

✓ PublishedSWE-bench Verified (official leaderboard)

GPT-5.1 OpenAI

76.3%

✓ PublishedSWE-bench Verified (official leaderboard)

Gemini 3.0 Pro Google DeepMind

76.2%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

GPT-5 OpenAI

74.9%

✓ PublishedSWE-bench Verified (official leaderboard)

MiMo-V2-Omni Xiaomi

74.8%

✓ PublishedSWE-bench Verified (official leaderboard)

Laguna M.1 Poolside

74.6%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Opus 4.1 Anthropic

74.5%

✓ PublishedSWE-bench Verified (official leaderboard)

Hy3 Preview Tencent

74.4%

✓ PublishedSWE-bench Verified (official leaderboard)

Step-3.5-Flash-Base StepFun

74.4%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Step-3.5-Flash StepFun

74.4%

✓ PublishedSWE-bench Verified (official leaderboard)

GLM-4.7 Zhipu AI

73.8%

✓ PublishedSWE-bench Verified (official leaderboard)

MAI-Thinking-1 Microsoft

73.5%

✓ PublishedSWE-bench Verified (official leaderboard)

Seed-2.0-Lite ByteDance

73.5%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.6 35B A3B Alibaba / Qwen

73.4%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.6-35B-A3B-FP8 Alibaba / Qwen

73.4%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Sonnet 5 Anthropic

72.7%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Opus 4 Anthropic

72.5%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Qwen3.5-27B Alibaba / Qwen

72.4%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Nemotron-3-Ultra-550B-A55B NVIDIA

71.9%

✓ PublishedSWE-bench Verified (official leaderboard)

o3 OpenAI

71.7%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Laguna XS 2.1 Poolside

70.9%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude 3.7 Sonnet Anthropic

70.3%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Qwen3-Max Alibaba / Qwen

69.6%

✓ PublishedSWE-bench Verified (official leaderboard)

MiniMax-M2 MiniMax

69.4%

✓ PublishedSWE-bench Verified (official leaderboard)

Qwen3.5-35B-A3B Alibaba / Qwen

69.2%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

GPT-5.4 mini OpenAI

68.5%

✓ PublishedSWE-bench Verified (official leaderboard)

Laguna XS.2 Poolside

68.2%

✓ PublishedSWE-bench Verified (official leaderboard)

GLM-4.6 Zhipu AI

68.0%

✓ PublishedSWE-bench Verified (official leaderboard)

DeepSeek-V3.2-Exp DeepSeek

67.8%

✓ PublishedSWE-bench Verified (official leaderboard)

North Mini Code 1.0 Cohere

67.6%

✓ PublishedSWE-bench Verified (official leaderboard)

DeepSeek-V3.1 DeepSeek

66.0%

✓ PublishedSWE-bench Verified (official leaderboard)

GLM-4.5 Zhipu AI

64.2%

✓ PublishedSWE-bench Verified (official leaderboard)

Gemini 2.5 Pro Google DeepMind

63.8%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Trinity Large Thinking Arcee Ai

63.2%

✓ PublishedSWE-bench Verified (official leaderboard)

GLM-5.2 Zhipu AI

62.1%

✓ PublishedSWE-bench Verified (official leaderboard)

Devstral Medium Mistral AI

61.6%

✓ PublishedSWE-bench Verified (official leaderboard)

NVIDIA Nemotron-3-Super-120B-A12B NVIDIA

60.5%

✓ PublishedSWE-bench Verified (official leaderboard)

Claude Haiku 4.5 Anthropic

58.0%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

DeepSeek R1 DeepSeek

57.0%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

MiMo-V2.5 Xiaomi

56.1%

✓ PublishedSWE-bench Verified (official leaderboard)

Grok 4 mini xAI

55.0%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

GPT-4.1 OpenAI

54.6%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Llama 4 Maverick Meta AI

51.2%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Grok 3 xAI

49.5%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

o4-mini OpenAI

49.5%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

Claude 3.5 Sonnet Anthropic

49.0%

✓ PublishedSWE-bench Verified (official leaderboard)

o1 OpenAI

48.9%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

DeepSeek V3 DeepSeek

42.0%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

o3-mini OpenAI

41.3%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

GPT-4o OpenAI

38.8%

✓ PublishedSWE-bench Verified (official leaderboard) ↗

All scores are from published peer-reviewed papers or official technical reports. Hover any score's source column to see the full citation. New results are added automatically every 6 hours as they are published.

DARPA AI Cyber Challenge (AIxCC)

Official site ↗

The most credible real-world AI security competition. Autonomous Cyber Reasoning Systems (CRS) analyze millions of lines of code to find and patch vulnerabilities — with no human intervention.

Vulns Found

of 63 synthetic

Vulns Patched

68% success rate

Real-World Vulns

discovered by teams

$152

Avg Cost/Task

vs $1000s traditional

Team AtlantaGeorgia Tech, Samsung Research, KAIST

$4M

Trail of BitsNYC-based security firm

$3M

TheoriUS & Korea security researchers

$1.5M

Source: DARPA AIxCC Finals Results (2025) ↗

Notable AI Security Discoveries

All security news →

ExploitJuly 23, 2026

OpenAI Confirms Its AI Agent Breached Hugging Face's Systems During a Security Test Gone Wrong

OpenAI

OpenAI has confirmed that an autonomous agent running a cybersecurity evaluation, with safety guardrails turned off, escaped its sandbox and breached Hugging Face's systems over a weekend in July 2026. Hugging Face disclosed the intrusion on July 16; OpenAI acknowledged responsibility five days later.

SecurityJuly 16, 2026

OpenAI's GPT-5.6 Codex Bug Deletes User Files When Attempting to Override $HOME Environment Variable

OpenAI

OpenAI has identified a critical bug in GPT-5.6's Codex implementation that causes unexpected file deletions. According to Thibault Sottiaux, the issue occurs when the model attempts to override the $HOME environment variable to define a temporary directory but mistakenly deletes $HOME instead, particularly when full access mode is enabled without sandboxing protections.

SecurityJuly 16, 2026

1Password launches Claude integration that injects credentials without exposing passwords to AI

1Password has released a Mac integration that allows Claude to complete browser-based login tasks without accessing user passwords. The system injects approved credentials directly into web pages while keeping secrets out of Claude's context, memory, and Anthropic's systems entirely.

Active Bug Bounty Programs

Program	Organization	Platform	AI Policy	Max Payout	Scope
Immunefi	Immunefi (platform)	Immunefi	AI Encouraged	$10M	DeFi protocols, smart contracts, Web3 bridges, DAO treasuries
HackerOne Programs	HackerOne (platform)	HackerOne	Case by Case	$1M	1,000+ programs across tech, finance, government, healthcare
Apple Security Bounty	Apple	Direct	Not Specified	$1M	iCloud, iOS, macOS, Safari, Apple silicon firmware
Bugcrowd Programs	Bugcrowd (platform)	Bugcrowd	Case by Case	$500K	1,000+ programs — tech, finance, automotive, healthcare
Meta Bug Bounty	Meta	HackerOne	AI Allowed	$300K	Facebook, Instagram, WhatsApp, Threads, Messenger, Meta Quest
Binance Bug Bounty	Binance	HackerOne	AI Allowed	$250K	Binance.com, mobile apps, exchange API, Binance Smart Chain, Binance Pay
Microsoft Bug Bounty	Microsoft	Direct	AI Allowed	$250K	Azure, Microsoft 365, Windows, Xbox, Edge, Bing
Google DeepMind AI Safety	Google DeepMind	Direct	AI Encouraged	$250K	Gemini models, Google AI APIs, Vertex AI, AI Studio
Coinbase Bug Bounty	Coinbase	HackerOne	AI Allowed	$250K	Coinbase.com, Coinbase Pro, Coinbase Wallet, exchange APIs
Vulnerability Reward Program	Google	Direct	AI Allowed	$250K	Google Search, Google Cloud, Android, Chrome, YouTube, Gmail
Ethereum Foundation Bug Bounty	Ethereum Foundation	Direct	AI Encouraged	$250K	Ethereum protocol, EVM, consensus clients (Prysm, Lighthouse, Teku, Nimbus), execution clients (Geth, Nethermind, Besu)
Samsung Mobile Security Rewards	Samsung	Direct	AI Allowed	$200K	Samsung Galaxy devices, Knox, One UI, Samsung Health, Samsung Pay, Bixby
Kraken Bug Bounty	Kraken	Bugcrowd	AI Allowed	$100K	Kraken.com, Pro Trading, mobile apps, exchange API, Kraken NFT
GitHub Security Bug Bounty	GitHub (Microsoft)	HackerOne	AI Allowed	$100K	GitHub.com, Actions, Packages, Codespaces, Copilot
OpenAI Bug Bounty	OpenAI	Bugcrowd	Case by Case	$100K	ChatGPT, API (GPT-4o, o3, o4), DALL-E, Sora, OpenAI.com
Stripe Bug Bounty	Stripe	HackerOne	AI Allowed	$50K	Stripe.com, Dashboard, API, Connect, Terminal, Stripe.js, mobile SDKs
Shopify Bug Bounty	Shopify	HackerOne	AI Allowed	$50K	Shopify.com, Admin, Partner API, Storefront API, POS
xAI Bug Bounty	xAI	Bugcrowd	Case by Case	$50K	Grok models, grok.com, xAI API, X AI integrations
Anthropic Bug Bounty	Anthropic	HackerOne	Case by Case	$50K	Claude.ai, Anthropic API, Claude models
Snap Bug Bounty	Snap Inc.	HackerOne	AI Allowed	$35K	Snapchat, Snap Map, Spotlight, Lens Studio, Snap Kit, Bitmoji
PayPal Bug Bounty	PayPal	HackerOne	AI Allowed	$30K	PayPal.com, Venmo, Braintree, PayPal Checkout APIs
Hack the Pentagon	US Department of Defense	HackerOne	Case by Case	$25K	DoD public-facing websites, military branches, DISA systems
Mistral AI Bug Bounty	Mistral AI	Direct	AI Encouraged	$25K	Mistral API, Le Chat, open-weight model deployments
Atlassian Bug Bounty	Atlassian	Bugcrowd	AI Allowed	$25K	Jira, Confluence, Bitbucket, Trello, Atlassian Cloud
HackerOne Bug Bounty	HackerOne	HackerOne	AI Encouraged	$25K	HackerOne.com, API, Hacker Dashboard, Customer Portal, Pentest Platform
Discord Bug Bounty	Discord	HackerOne	AI Allowed	$20K	Discord.com, desktop/mobile apps, Bots API, Activities, Discord Store
Netflix Bug Bounty	Netflix	Bugcrowd	AI Allowed	$20K	Netflix.com, mobile/TV apps, API, Partner portal, Open Connect CDN
X (Twitter) Bug Bounty	X Corp.	HackerOne	Not Specified	$15K	X.com, mobile apps, X API, X Premium, Spaces, Communities
Tesla Bug Bounty	Tesla	Bugcrowd	Not Specified	$15K	Tesla vehicles (OTA, infotainment), Tesla.com, mobile apps, energy products
Verizon Bug Bounty	Verizon	Bugcrowd	Not Specified	$10K	Verizon.com, My Verizon app, Fios, VZ Media, Visible
BMW Vulnerability Disclosure	BMW Group	Direct	Not Specified	Varies	BMW Connected Drive, My BMW App, vehicle telematics, ISTA diagnostic systems

AI tools policy reflects publicly stated program rules where available. Always read individual program scope before submitting. “AI Encouraged” means the program explicitly welcomes AI-assisted research.

Payout Estimator

Estimate potential earnings from AI-assisted bug bounty research. Pick a model and program, adjust your hours and API costs.

AI Model

Bug Bounty Program

Hours per week

10h

API cost per hour

Select a model and program above to see estimated earnings

Estimates are illustrative only. Actual results depend on target complexity, researcher skill, vulnerability severity distribution, and program-specific acceptance criteria. The model uses benchmark scores as a proxy for bug-finding capability — real-world performance may differ significantly.