Skip to main content

🏢 Integrated Vision and Language Lab, KAIST

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
·3021 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Integrated Vision and Language Lab, KAIST
SALOVA, a novel video-LLM framework, enhances long-form video comprehension through targeted retrieval. It introduces SceneWalk, a high-quality dataset of densely-captioned long videos, and integrates…