Skip to main content

🏢 College of Computer Science and Technology, Zhejiang University

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
·4036 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 College of Computer Science and Technology, Zhejiang University
New multimodal textbook dataset boosts Vision-Language Model (VLM) performance!