Skip to main content

🏢 Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University

CoMP: Continual Multimodal Pre-training for Vision Foundation Models
·3612 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University
COMP: Continually pre-training Vision Foundation Models for better vision and language alignment and arbitrary size inputs.