
ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
CVPR 2026 • 2026
We propose ReMoRa, a video MLLM that processes videos by operating directly on their compressed representations, using sparse RGB keyframes for appearance and a refined motion representation for temporal dynamics.




