🏢 Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
·3612 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University
COMP: Continually pre-training Vision Foundation Models for better vision and language alignment and arbitrary size inputs.