LLM News

Every LLM release, update, and milestone.

Filtered by:transformers✕ clear

research

Progressive Residual Warmup improves LLM pretraining stability and convergence speed

Researchers propose Progressive Residual Warmup (ProRes), a pretraining technique that staggers layer learning by gradually warming residual connections from 0 to 1, with deeper layers taking longer to activate. The method demonstrates faster convergence, stronger generalization, and improved downstream performance across multiple model scales and initialization schemes.

March 6, 2026 · 5:53 AM2 min read

pretraining transformers optimization

via arxiv.org ↗

model release

Step-3.5-Flash-Base: StepFun releases lightweight text generation model

StepFun has released Step-3.5-Flash-Base, a text generation model available on Hugging Face under Apache 2.0 license. The model is part of the Step 3.5 series and focuses on efficient inference.

March 5, 2026 · 8:50 AM1 min read

stepfun model-release text-generation

via huggingface.co ↗

research

DynFormer rethinks Transformers for physics simulations, cutting PDE solver errors by 95%

Researchers propose DynFormer, a Transformer variant designed specifically for solving partial differential equations (PDEs) that models physical systems at multiple scales simultaneously. By replacing uniform attention with specialized modules for different physical scales, DynFormer achieves up to 95% error reduction compared to existing neural operator baselines while consuming significantly less GPU memory.

March 5, 2026 · 1:09 AM2 min read

transformers pde-solvers neural-operators

via arxiv.org ↗