↓Skip to main content

🏢 UW-Madison

Interfacing Foundation Models' Embeddings

26 September 2024·2676 words·13 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 UW-Madison

FIND, a lightweight transformer interface, seamlessly aligns foundation models’ embeddings for unified image and dataset-level understanding, enabling generalizable, interleaved performance on segment…