↓Skip to main content

🏢 Queen Mary University of London

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

14 March 2025·222 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Queen Mary University of London

V-STaR: A new benchmark to evaluate Video-LLMs in video spatio-temporal reasoning, revealing gaps in current models’ understanding.