🏢 CUHK MMLab
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
·2418 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 CUHK MMLab
MoVA, a novel MLLM, enhances multimodal understanding by adaptively routing and fusing task-specific vision experts for improved generalization across diverse image content.