Skip to main content

🏢 University of Science and Technology of China

Equivariant Image Modeling
·3413 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
Aligning image generation subtasks: Equivariant modeling boosts efficiency and generalization by leveraging natural visual signal invariance.
Tokenize Image as a Set
·3037 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
TokenSet: Tokenizing images as unordered sets for dynamic capacity allocation and robust generation, breaking from fixed-position latent codes.
Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling
·2283 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Science and Technology of China
UAE-3D: A unified latent space approach for efficient & high-quality 3D molecular generation, outperforming existing methods in accuracy and speed.
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers
·2754 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
RelaCtrl: Relevance-guided control boosts diffusion transformer efficiency, cutting parameters by intelligently allocating resources.
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
·2694 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China
DepthMaster tames diffusion models for faster, more accurate monocular depth estimation by aligning generative features with high-quality semantic features and adaptively balancing low and high-freque…
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
·2542 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Science and Technology of China
Molar: A novel multimodal LLM framework boosts sequential recommendation accuracy by cleverly aligning collaborative filtering with rich item representations from text and non-text data.
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
·2297 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China
One-shot image to realistic, animatable talking avatar! Novel pipeline uses diffusion models and a hybrid 3DGS-mesh representation, achieving seamless generalization and precise control.