Self-confidence signals enable unsupervised reward training for text-to-image models
Researchers introduce SOLACE, a post-training framework that replaces external reward models with an internal self-confidence signal derived from how accurately a text-to-image model recovers injected noise. The method enables fully unsupervised optimization and shows measurable improvements in compositional generation, text rendering, and text-image alignment.
Self-Confidence Signals Enable Unsupervised Reward Training for Text-to-Image Models
A new post-training framework eliminates the need for external reward models, annotators, or additional datasets by leveraging intrinsic self-confidence signals to improve text-to-image generation quality.
Researchers have proposed SOLACE (Adaptive Rewarding by self-Confidence), a method that extracts reward signals directly from the model's internal confidence during denoising. Rather than relying on external supervision—the standard approach in RLHF-style post-training—SOLACE evaluates how accurately a model can recover injected noise under self-denoising probes, converting this signal into scalar rewards for optimization.
How It Works
The core mechanism is straightforward: SOLACE injects noise into generated images and measures the model's ability to recover the original content. High-confidence recoveries indicate the model is confident in its generation and become positive training signals. This self-supervised approach requires no human annotators, external reward models, or curated preference datasets—eliminating bottlenecks common in conventional post-training pipelines.
Empirical Results
According to the research, SOLACE delivers consistent gains across three key dimensions:
- Compositional generation: Improved ability to handle complex multi-object scenes
- Text rendering: Better text legibility within generated images
- Text-image alignment: Stronger correspondence between prompts and visual output
The framework also exhibits a complementary effect when combined with external rewards, reducing reward hacking—the phenomenon where models exploit reward signals in unintended ways.
Broader Implications
Text-to-image generation is foundational infrastructure for content creation, design automation, and synthetic data augmentation. Current state-of-the-art systems typically undergo post-training with external reward models trained on human preference data. SOLACE's unsupervised approach addresses a persistent challenge: the cost and bottleneck of obtaining large-scale high-quality human feedback.
The method's reliance on internal confidence metrics rather than external supervision also suggests potential advantages for privacy and computational efficiency in fine-tuning workflows.
The research was presented as a replacement announcement on arXiv, indicating authors refined the work following initial submission.
What This Means
Solace demonstrates that text-to-image models contain exploitable internal signals for quality improvements without external annotation infrastructure. If the approach generalizes robustly, it could reduce barriers to post-training for smaller teams and researchers. The compatibility with external rewards also suggests this as a complementary technique rather than a replacement, opening hybrid post-training strategies. Expect follow-up work testing scalability to larger models and datasets.