Skip to main content

🏢 CUHK MMLab

MoVA: Adapting Mixture of Vision Experts to Multimodal Context
·2418 words·12 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 CUHK MMLab
MoVA, a novel MLLM, enhances multimodal understanding by adaptively routing and fusing task-specific vision experts for improved generalization across diverse image content.