Skip to main content

🏢 DAMO Academy, Alibaba Group

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
·3571 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 DAMO Academy, Alibaba Group
VideoRefer Suite boosts video LLM understanding by introducing a large-scale, high-quality object-level video instruction dataset, a versatile spatial-temporal object encoder model, and a comprehensiv…