Skip to main content

🏢 UW-Madison

Interfacing Foundation Models' Embeddings
·2676 words·13 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏢 UW-Madison
FIND, a lightweight transformer interface, seamlessly aligns foundation models’ embeddings for unified image and dataset-level understanding, enabling generalizable, interleaved performance on segment…