Skip to main content

🏢 Huazhong University of Science & Technology

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
·2562 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Huazhong University of Science & Technology
GroundingSuite: A new benchmark that measures complex multi-granular pixel grounding to overcome current dataset limitations and push forward vision-language understanding.
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
·2951 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 Huazhong University of Science & Technology
OmniMamba: Efficient multimodal understanding and generation via SSMs, trained on 2M image-text pairs.
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
·2823 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Autonomous Vehicles 🏢 Huazhong University of Science & Technology
RAD: 3DGS-based RL advances autonomous driving, achieving a 3x lower collision rate!