🏢 Queen Mary University of London
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
·222 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Queen Mary University of London
V-STaR: A new benchmark to evaluate Video-LLMs in video spatio-temporal reasoning, revealing gaps in current models’ understanding.