Skip to main content

🏢 Chinese University of Hong Kong, Shenzhen

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
·2407 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Text-to-Speech 🏢 Chinese University of Hong Kong, Shenzhen
Emilia-Pipe and its resulting datasets, Emilia and Emilia-Large, offer the largest open-source, multilingual speech corpus, enabling more natural and spontaneous AI speech generation.
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
·5637 words·27 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Chinese University of Hong Kong, Shenzhen
Multimodal LLMs for medical imaging now generalize better via compositional generalization, leveraging relationships between image features (modality, anatomy, task) to understand unseen images and im…
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
·3165 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Chinese University of Hong Kong, Shenzhen
MM-Detect: a novel framework detects contamination in multimodal LLMs, enhancing benchmark reliability by identifying training set leakage and improving performance evaluations.