↓Skip to main content

🏢 Integrated Vision and Language Lab, KAIST

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

25 November 2024·3021 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Integrated Vision and Language Lab, KAIST

SALOVA, a novel video-LLM framework, enhances long-form video comprehension through targeted retrieval. It introduces SceneWalk, a high-quality dataset of densely-captioned long videos, and integrates…