🏢 School of Computer Science and Engineering, Tianjin University of Technology
Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models
·2675 words·13 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 School of Computer Science and Engineering, Tianjin University of Technology
Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR) significantly improves vision-language model robustness against adversarial attacks by aligning and constraining text-guided attention, achievi…