↓Skip to main content

🏢 Huazhong University of Science & Technology

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

13 March 2025·2562 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Huazhong University of Science & Technology

GroundingSuite: A new benchmark that measures complex multi-granular pixel grounding to overcome current dataset limitations and push forward vision-language understanding.

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

11 March 2025·2951 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 Huazhong University of Science & Technology

OmniMamba: Efficient multimodal understanding and generation via SSMs, trained on 2M image-text pairs.

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

18 February 2025·2823 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Autonomous Vehicles 🏢 Huazhong University of Science & Technology

RAD: 3DGS-based RL advances autonomous driving, achieving a 3x lower collision rate!