Skip to main content

🏢 Shanghai Innovation Institute Huawei Noah's Ark Lab

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
·7212 words·34 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Innovation Institute Huawei Noah's Ark Lab
INST-IT boosts multimodal instance understanding by using explicit visual prompts for instruction tuning, achieving significant improvements on various benchmarks.