Visual Question Answering
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
·2153 words·11 mins·
loading
·
loading
Computer Vision
Visual Question Answering
🏢 Google DeepMind
LLM-powered data curation boosts web-scale visual entity recognition!
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
·3011 words·15 mins·
loading
·
loading
Computer Vision
Visual Question Answering
🏢 UC San Diego
VLG-CBM enhances concept bottleneck models with vision-language guidance for faithful interpretability and improved accuracy.
Visual Prompt Tuning in Null Space for Continual Learning
·2254 words·11 mins·
loading
·
loading
AI Generated
Computer Vision
Visual Question Answering
🏢 School of Computer Science, Northwestern Polytechnical University
This paper presents NSP², a novel method for visual prompt tuning in continual learning that leverages orthogonal projection to prevent catastrophic forgetting by tuning prompts orthogonal to previous…
Parallel Backpropagation for Shared-Feature Visualization
·1538 words·8 mins·
loading
·
loading
Visual Question Answering
🏢 Hertie Institute, University Clinics Tübingen
Researchers visualized shared visual features driving responses of body-selective neurons to non-body objects, revealing object parts resembling macaque body parts, thus explaining neural preferences.
Neural Concept Binder
·3025 words·15 mins·
loading
·
loading
Computer Vision
Visual Question Answering
🏢 Computer Science Department, TU Darmstadt
The Neural Concept Binder (NCB) framework learns expressive, inspectable, and revisable visual concepts unsupervised, integrating both continuous and discrete representations for seamless use in neura…
Learning to Edit Visual Programs with Self-Supervision
·2121 words·10 mins·
loading
·
loading
Computer Vision
Visual Question Answering
🏢 Brown University
AI learns to edit visual programs more accurately using a self-supervised method that combines one-shot program generation with iterative local edits, significantly boosting performance, especially wi…
EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing
·3286 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Visual Question Answering
🏢 State Key Lab of CAD&CG, Zhejiang University
EMVP: A novel PEFT pipeline boosts Visual Place Recognition accuracy by 97.6% using Centroid-Free Probing & Dynamic Power Normalization, saving 64.3% of parameters.
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
·9001 words·43 mins·
loading
·
loading
AI Generated
Computer Vision
Visual Question Answering
🏢 Brown University
Vision transformers surprisingly struggle with visual relations; this study reveals ViTs use distinct perceptual and relational processing stages to solve same/different tasks, highlighting a previous…