research

Self-confidence signals enable unsupervised reward training for text-to-image models

Researchers introduce SOLACE, a post-training framework that replaces external reward models with an internal self-confidence signal derived from how accurately a text-to-image model recovers injected noise. The method enables fully unsupervised optimization and shows measurable improvements in compositional generation, text rendering, and text-image alignment.

March 6, 2026 · 5:05 AM2 min read

Self-Confidence Signals Enable Unsupervised Reward Training for Text-to-Image Models

A new post-training framework eliminates the need for external reward models, annotators, or additional datasets by leveraging intrinsic self-confidence signals to improve text-to-image generation quality.

Researchers have proposed SOLACE (Adaptive Rewarding by self-Confidence), a method that extracts reward signals directly from the model's internal confidence during denoising. Rather than relying on external supervision—the standard approach in RLHF-style post-training—SOLACE evaluates how accurately a model can recover injected noise under self-denoising probes, converting this signal into scalar rewards for optimization.

How It Works

The core mechanism is straightforward: SOLACE injects noise into generated images and measures the model's ability to recover the original content. High-confidence recoveries indicate the model is confident in its generation and become positive training signals. This self-supervised approach requires no human annotators, external reward models, or curated preference datasets—eliminating bottlenecks common in conventional post-training pipelines.

Empirical Results

According to the research, SOLACE delivers consistent gains across three key dimensions:

Compositional generation: Improved ability to handle complex multi-object scenes
Text rendering: Better text legibility within generated images
Text-image alignment: Stronger correspondence between prompts and visual output

The framework also exhibits a complementary effect when combined with external rewards, reducing reward hacking—the phenomenon where models exploit reward signals in unintended ways.

Broader Implications

Text-to-image generation is foundational infrastructure for content creation, design automation, and synthetic data augmentation. Current state-of-the-art systems typically undergo post-training with external reward models trained on human preference data. SOLACE's unsupervised approach addresses a persistent challenge: the cost and bottleneck of obtaining large-scale high-quality human feedback.

The method's reliance on internal confidence metrics rather than external supervision also suggests potential advantages for privacy and computational efficiency in fine-tuning workflows.

The research was presented as a replacement announcement on arXiv, indicating authors refined the work following initial submission.

What This Means

Solace demonstrates that text-to-image models contain exploitable internal signals for quality improvements without external annotation infrastructure. If the approach generalizes robustly, it could reduce barriers to post-training for smaller teams and researchers. The compatibility with external rewards also suggests this as a complementary technique rather than a replacement, opening hybrid post-training strategies. Expect follow-up work testing scalability to larger models and datasets.

Source: arxiv.org ↗

text-to-image post-training reward-modeling reinforcement-learning unsupervised-optimization SOLACE generative-models