🏢 Beihang University
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
·4108 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Beihang University
VideoEspresso: A new dataset and Hybrid LVLMs framework boost fine-grained video reasoning!