↓Skip to main content

🏢 State Key Laboratory for Novel Software Technology, Nanjing University

Bridge the Modality and Capability Gaps in Vision-Language Model Selection

26 September 2024·3390 words·16 mins· loading · loading

AI Generated Natural Language Processing Vision-Language Models 🏢 State Key Laboratory for Novel Software Technology, Nanjing University

SWAB bridges modality and capability gaps in Vision-Language Model selection using optimal transport, enabling accurate prediction of VLM performance without images.

AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

26 September 2024·2546 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 State Key Laboratory for Novel Software Technology, Nanjing University

AWT: a novel framework boosts vision-language model’s zero-shot capabilities by augmenting inputs, weighting them dynamically, and leveraging optimal transport to enhance semantic correlations.

AP-Adapter: Improving Generalization of Automatic Prompts on Unseen Text-to-Image Diffusion Models

26 September 2024·2738 words·13 mins· loading · loading

Natural Language Processing Text Generation 🏢 State Key Laboratory for Novel Software Technology, Nanjing University

AP-Adapter boosts text-to-image diffusion model generalization by using a two-stage prompt optimization method that leverages large language models and inter-model differences.

A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning

26 September 2024·2451 words·12 mins· loading · loading

Natural Language Processing Question Answering 🏢 State Key Laboratory for Novel Software Technology, Nanjing University

KG-ICL, a novel prompt-based knowledge graph foundation model, achieves universal in-context reasoning by leveraging in-context learning and a unified tokenizer, outperforming various baselines on 43 …