Multimodal Generation
Unity by Diversity: Improved Representation Learning for Multimodal VAEs
·3037 words·15 mins·
loading
·
loading
Multimodal Learning
Multimodal Generation
🏢 ETH Zurich
MMVM VAE enhances multimodal data analysis by using a soft constraint to guide each modality’s latent representation toward a shared aggregate, improving latent representation learning and missing dat…
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
·2462 words·12 mins·
loading
·
loading
Multimodal Learning
Multimodal Generation
🏢 Zhejiang University
MoMu-Diffusion: a novel framework that learns long-term motion-music synchronization, generating realistic and beat-matched sequences surpassing existing methods.
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
·2931 words·14 mins·
loading
·
loading
Multimodal Learning
Multimodal Generation
🏢 Beijing University of Posts and Telecommunications
Lumina-Next supercharges image generation: faster, more efficient, and better resolution with new architecture and sampling techniques.
Images that Sound: Composing Images and Sounds on a Single Canvas
·2562 words·13 mins·
loading
·
loading
Multimodal Learning
Multimodal Generation
🏢 University of Michigan
Researchers create ‘images that sound’—visual spectrograms looking like natural images and sounding like natural audio—by cleverly composing pre-trained image and audio diffusion models in a shared la…