↓Skip to main content

🏢 SenseTime Research

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

17 February 2025·221 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 SenseTime Research

MaskGWM: Improves driving world models by using video mask reconstruction for better generalization.

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

29 November 2024·3277 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 SenseTime Research

SOLAMI: enabling immersive, natural interactions with 3D characters via a unified social vision-language-action model and a novel synthetic multimodal dataset.

WHAC: World-grounded Humans and Cameras

19 March 2024·3487 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 SenseTime Research

WHAC: Grounding humans and cameras together!