🏢 Chinese University of Hong Kong, Shenzhen
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
·2407 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Speech and Audio
Text-to-Speech
🏢 Chinese University of Hong Kong, Shenzhen
Emilia-Pipe and its resulting datasets, Emilia and Emilia-Large, offer the largest open-source, multilingual speech corpus, enabling more natural and spontaneous AI speech generation.
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
·5637 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Chinese University of Hong Kong, Shenzhen
Multimodal LLMs for medical imaging now generalize better via compositional generalization, leveraging relationships between image features (modality, anatomy, task) to understand unseen images and im…
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
·3165 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Chinese University of Hong Kong, Shenzhen
MM-Detect: a novel framework detects contamination in multimodal LLMs, enhancing benchmark reliability by identifying training set leakage and improving performance evaluations.