LLM News

Every LLM release, update, and milestone.

Filtered by:training-efficiency✕ clear
research

FlyThinker: Researchers propose parallel reasoning during generation for personalized responses

Researchers introduce FlyThinker, a framework that runs reasoning and generation concurrently rather than sequentially, addressing limitations of existing "think-then-generate" approaches in long-form personalized text generation. The method uses a separate reasoning model that generates token-level guidance in parallel with the main generation model, enabling more adaptive reasoning without sacrificing computational efficiency.

research

Research shows many-shot in-context learning closes gap with dedicated fine-tuning

Researchers propose Many-Shot In-Context Fine-tuning (ManyICL), a method that enables moderately-sized LLMs like Mistral 7B and Llama-3 8B to match dedicated fine-tuning performance while handling multiple downstream tasks with a single model. The approach treats in-context examples as training targets rather than prompts, significantly reducing the performance gap with task-specific models.

research

Researchers propose VCPO to stabilize asynchronous RL training for LLMs, cutting training time 2.5x

A new technique called Variance Controlled Policy Optimization (VCPO) addresses a fundamental problem in asynchronous reinforcement learning for LLMs: high variance in policy-gradient estimates from stale rollouts. The method scales learning rates based on effective sample size and applies a minimum-variance baseline, reducing long-context training time by 2.5x while maintaining synchronous performance.