LLM News

Every LLM release, update, and milestone.

Filtered by:transformers✕ clear
research

Progressive Residual Warmup improves LLM pretraining stability and convergence speed

Researchers propose Progressive Residual Warmup (ProRes), a pretraining technique that staggers layer learning by gradually warming residual connections from 0 to 1, with deeper layers taking longer to activate. The method demonstrates faster convergence, stronger generalization, and improved downstream performance across multiple model scales and initialization schemes.

research

DynFormer rethinks Transformers for physics simulations, cutting PDE solver errors by 95%

Researchers propose DynFormer, a Transformer variant designed specifically for solving partial differential equations (PDEs) that models physical systems at multiple scales simultaneously. By replacing uniform attention with specialized modules for different physical scales, DynFormer achieves up to 95% error reduction compared to existing neural operator baselines while consuming significantly less GPU memory.