LLM News

Every LLM release, update, and milestone.

Filtered by:fine-tuning✕ clear

research

Reinforcement fine-tuning preserves model knowledge better than supervised fine-tuning, study finds

A new study on Qwen2.5-VL reveals reinforcement fine-tuning (RFT) significantly outperforms supervised fine-tuning (SFT) at preserving a model's existing knowledge during post-training adaptation. While SFT enables faster task learning, it causes catastrophic forgetting; RFT learns more slowly but maintains prior knowledge by reinforcing samples naturally aligned with the base model's probability landscape.

March 6, 2026 · 5:10 AM2 min read

reinforcement-learning fine-tuning multimodal-models

via arxiv.org ↗

research

Spectral Surgery: Training-Free Method Improves LoRA Adapters Without Retraining

Researchers propose Spectral Surgery, a training-free refinement method that improves Low-Rank Adaptation (LoRA) adapters by decomposing trained weights via SVD and selectively reweighting singular values based on gradient-estimated component sensitivity. The approach achieves consistent gains across Llama-3.1-8B and Qwen3-8B—up to +4.4 points on CommonsenseQA and +2.4 pass@1 on HumanEval—by adjusting only ~1,000 scalar coefficients.

March 5, 2026 · 5:36 AM2 min read

lora fine-tuning parameter-efficient-tuning

via arxiv.org ↗

research

WAFFLE fine-tuning improves multimodal models for web development by 9 percentage points

Researchers introduce WAFFLE, a fine-tuning methodology that enhances multimodal models' ability to convert UI designs into HTML code. The approach uses structure-aware attention mechanisms and contrastive learning to bridge the gap between visual UI designs and text-based HTML, achieving up to 9 percentage point improvements on benchmark tasks.

March 5, 2026 · 1:10 AM2 min read

research multimodal-models code-generation

via arxiv.org ↗

research

DiaBlo: Diagonal Block Fine-Tuning Matches Full Model Performance With Lower Cost

Researchers introduce DiaBlo, a parameter-efficient fine-tuning method that updates only diagonal blocks of model weight matrices instead of full parameters. The approach matches full-model fine-tuning performance across reasoning, code generation, and safety tasks while maintaining comparable memory usage and training speed to LoRA.

March 5, 2026 · 1:09 AM2 min read

PEFT fine-tuning parameter-efficient-adaptation

via arxiv.org ↗

research

Research shows many-shot in-context learning closes gap with dedicated fine-tuning

Researchers propose Many-Shot In-Context Fine-tuning (ManyICL), a method that enables moderately-sized LLMs like Mistral 7B and Llama-3 8B to match dedicated fine-tuning performance while handling multiple downstream tasks with a single model. The approach treats in-context examples as training targets rather than prompts, significantly reducing the performance gap with task-specific models.

March 5, 2026 · 12:53 AM2 min read

in-context-learning fine-tuning mistral

via arxiv.org ↗

research

Researchers identify divergence term selection as key to preventing LLM performance collapse in RL fine-tuning

A new paper identifies a fundamental flaw in standard reinforcement learning fine-tuning approaches for large language models: the choice of divergence term directly causes the degradation of multi-attempt performance (Pass@k) despite single-attempt improvements. Researchers propose Diversity-Preserving Hybrid RL (DPH-RL), which uses mass-covering f-divergences to maintain broad solution coverage and prevent catastrophic forgetting.

March 5, 2026 · 12:53 AM2 min read

reinforcement-learning large-language-models rlvr

via arxiv.org ↗