Skip to main content

🏢 University of Science and Technology of China

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
·2694 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China
DepthMaster tames diffusion models for faster, more accurate monocular depth estimation by aligning generative features with high-quality semantic features and adaptively balancing low and high-freque…
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
·2542 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Science and Technology of China
Molar: A novel multimodal LLM framework boosts sequential recommendation accuracy by cleverly aligning collaborative filtering with rich item representations from text and non-text data.
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
·2297 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China
One-shot image to realistic, animatable talking avatar! Novel pipeline uses diffusion models and a hybrid 3DGS-mesh representation, achieving seamless generalization and precise control.