🏢 Czech Institute of Informatics, Robotics and Cybernetics
Large-scale Pre-training for Grounded Video Caption Generation
·2703 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Czech Institute of Informatics, Robotics and Cybernetics
GROVE: Pre-training on large-scale data for grounded video caption generation.