🏢 University of Pennsylvania
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
·4251 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 University of Pennsylvania
CoSyn: Code-guided synth data for scaling text-rich image understanding, achieving SOTA via targeted multimodal data generation!