🏢 DAMO Academy, Alibaba Group
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
·3571 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 DAMO Academy, Alibaba Group
VideoRefer Suite boosts video LLM understanding by introducing a large-scale, high-quality object-level video instruction dataset, a versatile spatial-temporal object encoder model, and a comprehensiv…