↓Skip to main content

🏢 Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

24 March 2025·3612 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University

COMP: Continually pre-training Vision Foundation Models for better vision and language alignment and arbitrary size inputs.