Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders

ICLR, 2026

Bo Fang, YuXin Song, Haoyuan Sun, Qiangqiang Wu, Wenhao Wu, Antoni B. Chan

PDF
CODE