RubiCap (2B/3B/7B)

Apple🇺🇸 United States
active

Version History

1.0major

Apple developed RubiCap, a rubric-guided reinforcement learning framework for dense image captioning that achieves state-of-the-art results with 2B-7B parameter models, outperforming competitors up to 72B parameters.

Coverage

researchApple

Apple's RubiCap model generates better image captions with 3-7B parameters than 72B competitors

Apple researchers developed RubiCap, a framework for training dense image captioning models that achieve state-of-the-art results at 2B, 3B, and 7B parameter scales. The 7B model outperforms models up to 72 billion parameters on multiple benchmarks including CapArena and CaptionQA, while the 3B variant matches larger 32B models, suggesting efficient dense captioning doesn't require massive scale.

2 min read