Skip to main content

🏢 Czech Institute of Informatics, Robotics and Cybernetics

Large-scale Pre-training for Grounded Video Caption Generation
·2703 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Czech Institute of Informatics, Robotics and Cybernetics
GROVE: Pre-training on large-scale data for grounded video caption generation.