LLM News

Every LLM release, update, and milestone.

Filtered by:llama-3-1✕ clear

product update

Taalas serves Llama 3.1 8B at 17,000 tokens/second with custom silicon

Taalas, a new Canadian hardware startup, announced its first product: a custom silicon implementation of Meta's Llama 3.1 8B model running at 17,000 tokens/second. The startup uses aggressive quantization combining 3-bit and 6-bit parameters. The system is accessible via chatjimmy.ai.

February 20, 2026 · 10:20 PM2 min read

taalas llama-3-1 inference

via simonwillison.net ↗