🏢 University of Science and Technology of China
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
·2694 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Science and Technology of China
DepthMaster tames diffusion models for faster, more accurate monocular depth estimation by aligning generative features with high-quality semantic features and adaptively balancing low and high-freque…
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
·2542 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Science and Technology of China
Molar: A novel multimodal LLM framework boosts sequential recommendation accuracy by cleverly aligning collaborative filtering with rich item representations from text and non-text data.
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
·2297 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Science and Technology of China
One-shot image to realistic, animatable talking avatar! Novel pipeline uses diffusion models and a hybrid 3DGS-mesh representation, achieving seamless generalization and precise control.