Skip to main content

🏢 Samsung AI Cambridge

Discriminative Fine-tuning of LVLMs
·4145 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Samsung AI Cambridge
VladVA: A novel training framework converts generative LVLMs into powerful discriminative models, achieving state-of-the-art performance on image-text retrieval and compositionality benchmarks.