🏢 College of Computer Science and Technology, Zhejiang University
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
·4036 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 College of Computer Science and Technology, Zhejiang University
New multimodal textbook dataset boosts Vision-Language Model (VLM) performance!