3D Vision

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

10 February 2025·2951 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Texas at Austin

TripoSG: High-fidelity 3D shapes synthesized via large-scale rectified flow models, pushing image-to-3D generation to new heights.

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

7 February 2025·4072 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National Yang Ming Chiao Tung University

AuraFusion360: High-quality 360° scene inpainting achieved via novel augmented unseen region alignment and a new benchmark dataset.

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

4 February 2025·4621 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Singapore University of Technology and Design

MotionLab: One framework to rule them all! Unifying human motion generation & editing via a novel Motion-Condition-Motion paradigm, boosting efficiency and generalization.

Relightable Full-Body Gaussian Codec Avatars

24 January 2025·3832 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 ETH Zurich

Relightable Full-Body Gaussian Codec Avatars: Realistic, animatable full-body avatars are now possible using learned radiance transfer and efficient 3D Gaussian splatting.

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

21 January 2025·3101 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent AI Lab

Hunyuan3D 2.0: A groundbreaking open-source system generating high-resolution, textured 3D assets using scalable diffusion models, exceeding state-of-the-art performance.

GSTAR: Gaussian Surface Tracking and Reconstruction

17 January 2025·2047 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 ETH Zurich

GSTAR: A novel method achieving photorealistic rendering, accurate reconstruction, and reliable 3D tracking of dynamic scenes with changing topology, even handling surfaces appearing, disappearing, or…

GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor

17 January 2025·2208 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

GaussianAvatar-Editor enables photorealistic, text-driven editing of animatable 3D heads, solving motion occlusion and ensuring temporal consistency.

CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

16 January 2025·3330 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Graphics AI Lab, NC Research

CaPa: Carve-n-Paint Synthesis generates hyper-realistic 4K textured meshes in under 30 seconds, setting a new standard for efficient 3D asset creation.

CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities

15 January 2025·3972 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent AI Lab

CityDreamer4D generates realistic, unbounded 4D city models by cleverly separating dynamic objects (like vehicles) from static elements (buildings, roads), using multiple neural fields for enhanced re…

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

8 January 2025·2783 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Stability AI

SPAR3D: Fast, accurate single-image 3D reconstruction via a novel two-stage approach using point clouds for high-fidelity mesh generation.

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

7 January 2025·3325 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Electronics and Telecommunications Research Institute

MoDec-GS: a novel framework achieving 70% model size reduction in dynamic 3D Gaussian splatting while improving visual quality by cleverly decomposing complex motions and optimizing temporal intervals…

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

7 January 2025·3489 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Fudan University

DOLPHIN: AI automates scientific research from idea generation to experimental validation.

Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

7 January 2025·5463 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Cambridge

Chirpy3D: Generating creative, high-quality 3D birds with intricate details by learning a continuous part latent space from 2D images.

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

5 January 2025·2694 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China

DepthMaster tames diffusion models for faster, more accurate monocular depth estimation by aligning generative features with high-quality semantic features and adaptively balancing low and high-freque…

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

24 December 2024·3061 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Meta AI

PartGen generates compositional 3D objects with meaningful parts from text, images, or unstructured 3D data using multi-view diffusion models, enabling flexible 3D part editing.

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

24 December 2024·3014 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

Orient Anything: Learning robust object orientation estimation directly from rendered 3D models, achieving state-of-the-art accuracy on real images.

DepthLab: From Partial to Complete

24 December 2024·2516 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 HKU

DepthLab: a novel image-conditioned depth inpainting model enhances downstream 3D tasks by effectively completing partial depth information, showing superior performance and generalization.

DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation

19 December 2024·2004 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent PCG

DI-PCG uses a lightweight diffusion transformer to efficiently and accurately estimate parameters of procedural generators from images, enabling high-fidelity 3D asset creation.

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

18 December 2024·4162 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

Prompting unlocks 4K metric depth from low-cost LiDAR.

Wonderland: Navigating 3D Scenes from a Single Image

16 December 2024·3153 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Toronto

Generate wide-scope 3D scenes from single images in a snap!