🏢 SKLSDE Lab, Beihang University
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
·2566 words·13 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 SKLSDE Lab, Beihang University
Voila-A enhances vision-language models by aligning their attention with user gaze, improving real-world application effectiveness and interpretability.