Skip to main content

🏢 University of Pennsylvania

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
·4251 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Pennsylvania
CoSyn: Code-guided synth data for scaling text-rich image understanding, achieving SOTA via targeted multimodal data generation!