Skip to main content

🏢 SenseTime Research

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
·221 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 SenseTime Research
MaskGWM: Improves driving world models by using video mask reconstruction for better generalization.
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
·3277 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 SenseTime Research
SOLAMI: enabling immersive, natural interactions with 3D characters via a unified social vision-language-action model and a novel synthetic multimodal dataset.
WHAC: World-grounded Humans and Cameras
·3487 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 SenseTime Research
WHAC: Grounding humans and cameras together!