Skip to main content

🏢 SKLSDE Lab, Beihang University

Voila-A: Aligning Vision-Language Models with User's Gaze Attention
·2566 words·13 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 SKLSDE Lab, Beihang University
Voila-A enhances vision-language models by aligning their attention with user gaze, improving real-world application effectiveness and interpretability.