dense-captioning

1 article tagged with dense-captioning

May 20, 2026
model release

NemoStation releases Marlin-2B: 2-billion parameter video VLM achieves dense captioning performance between Tarsier-34B

NemoStation has released Marlin-2B, a 2-billion parameter video vision-language model that produces structured scene and event captions with second-precise timestamps. The model tops the CaReBench dense captioning leaderboard and sits between Tarsier-34B and Gemini-1.5-Pro on DREAM-1K, while matching Gemini-2.0-Flash on temporal grounding benchmarks.