↓Skip to main content

🏢 Chinese University of Hong Kong, Shenzhen

Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging

28 March 2025·2702 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University of Hong Kong, Shenzhen

Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging.

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

27 January 2025·2407 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Speech and Audio Text-to-Speech 🏢 Chinese University of Hong Kong, Shenzhen

Emilia-Pipe and its resulting datasets, Emilia and Emilia-Large, offer the largest open-source, multilingual speech corpus, enabling more natural and spontaneous AI speech generation.

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

28 December 2024·5637 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Chinese University of Hong Kong, Shenzhen

Multimodal LLMs for medical imaging now generalize better via compositional generalization, leveraging relationships between image features (modality, anatomy, task) to understand unseen images and im…

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

6 November 2024·3165 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Chinese University of Hong Kong, Shenzhen

MM-Detect: a novel framework detects contamination in multimodal LLMs, enhancing benchmark reliability by identifying training set leakage and improving performance evaluations.