↓Skip to main content

Visual Question Answering

When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

20 March 2025·2005 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 AIRI

Efficient image representation via adaptive token reduction.

SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images

23 December 2024·2647 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 Kyoto University

SBS Figures creates a massive, high-quality figure QA dataset via a novel stage-by-stage synthesis pipeline, enabling efficient pre-training of visual language models.

VisualLens: Personalization through Visual History

25 November 2024·2160 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 Meta

VisualLens leverages user visual history for personalized recommendations, improving state-of-the-art by 5-10% and exceeding GPT-4’s performance.

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

29 October 2024·3392 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 University of California, Berkeley

DynaMath, a novel benchmark, reveals that state-of-the-art VLMs struggle with variations of simple math problems, showcasing their reasoning fragility. It offers 501 high-quality seed questions, dyna…