Skip to main content

🏢 CUHK

VisionZip: Longer is Better but Not Necessary in Vision Language Models
·7032 words·34 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 CUHK
VisionZip boosts vision-language model efficiency by intelligently selecting key visual tokens, achieving near-state-of-the-art performance with drastically reduced computational costs.