Skip to main content

🏢 University of Edinburgh

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
·3107 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 University of Edinburgh
VMB generates music from videos, images, and text, using description and retrieval bridges to improve quality and controllability.