Skip to main content

🏢 School of Computer Science and Engineering, Tianjin University of Technology

Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models
·2675 words·13 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 School of Computer Science and Engineering, Tianjin University of Technology
Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR) significantly improves vision-language model robustness against adversarial attacks by aligning and constraining text-guided attention, achievi…