🏢 DAMO Academy, Alibaba Group
SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models
·3049 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 DAMO Academy, Alibaba Group
SHMT: Self-supervised Hierarchical Makeup Transfer uses latent diffusion models to realistically and precisely apply diverse makeup styles to faces, even without paired training data, achieving high f…
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
·2384 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 DAMO Academy, Alibaba Group
Animate3D animates any 3D model using multi-view video diffusion, achieving superior spatiotemporal consistency and straightforward mesh animation.
Aligning Audio-Visual Joint Representations with an Agentic Workflow
·1961 words·10 mins·
loading
·
loading
Multimodal Learning
Audio-Visual Learning
🏢 DAMO Academy, Alibaba Group
AVAgent uses an LLM-driven workflow to intelligently align audio and visual data, resulting in improved AV joint representations and state-of-the-art performance on various downstream tasks.