🏢 SenseTime Research
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
·221 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 SenseTime Research
MaskGWM: Improves driving world models by using video mask reconstruction for better generalization.
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
·3277 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 SenseTime Research
SOLAMI: enabling immersive, natural interactions with 3D characters via a unified social vision-language-action model and a novel synthetic multimodal dataset.
WHAC: World-grounded Humans and Cameras
·3487 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 SenseTime Research
WHAC: Grounding humans and cameras together!