Visual Question Answering
SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
·2647 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Visual Question Answering
🏢 Kyoto University
SBS Figures creates a massive, high-quality figure QA dataset via a novel stage-by-stage synthesis pipeline, enabling efficient pre-training of visual language models.
VisualLens: Personalization through Visual History
·2160 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Visual Question Answering
🏢 Meta
VisualLens leverages user visual history for personalized recommendations, improving state-of-the-art by 5-10% and exceeding GPT-4’s performance.
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
·3392 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Visual Question Answering
🏢 University of California, Berkeley
DynaMath, a novel benchmark, reveals that state-of-the-art VLMs struggle with variations of simple math problems, showcasing their reasoning fragility. It offers 501 high-quality seed questions, dyna…