CoDAR framework shows continuous diffusion language models can match discrete approaches
A new paper identifies token rounding as the primary bottleneck limiting continuous diffusion language models (DLMs) and proposes CoDAR, a two-stage framework that combines continuous embedding-space diffusion with a contextual autoregressive decoder. Experiments on LM1B and OpenWebText show CoDAR achieves competitive performance with discrete diffusion approaches while offering tunable fluency-diversity trade-offs.