Skip to main content

3D Vision

Wonderland: Navigating 3D Scenes from a Single Image
·3153 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Toronto
Generate wide-scope 3D scenes from single images in a snap!
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors
·2185 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanjing University
Create realistic 3D heads with specific hairstyles from text, no 3D hair data needed!
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
·4603 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Sungkyunkwan University
Leveraging video models, researchers achieve state-of-the-art 3D super-resolution by generating ‘video-like’ sequences from unordered images, eliminating artifacts and computational demands.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
·3969 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Peking University
MOVIS enhances 3D scene generation by improving cross-view consistency in multi-object novel view synthesis.
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
·3912 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University Hong Kong
IDArb: A diffusion model for decomposing images into intrinsic components like albedo, normal, and material properties, handling varying views and lighting.
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·3380 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
Training-free method adds physical properties to 3D models using vision-language models.
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
·3868 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University of Hong Kong
Neural LightRig uses multi-light diffusion to accurately estimate object normals and materials from a single image, outperforming existing methods.
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
·4390 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent AI Lab
FreeSplatter: a novel feed-forward framework reconstructs high-quality 3D scenes from uncalibrated sparse-view images, estimating camera poses in seconds.
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
·2260 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
MIDI: a novel multi-instance diffusion model generates compositional 3D scenes from single images by simultaneously creating multiple 3D instances with accurate spatial relationships and high generali…
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
·4118 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
ScoreLiDAR: Distilling diffusion models for 5x faster, higher-quality 3D LiDAR scene completion!
2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction
·2645 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
2DGS-Room: Seed-guided 2D Gaussian splatting with geometric constraints achieves state-of-the-art high-fidelity indoor scene reconstruction.
Structured 3D Latents for Scalable and Versatile 3D Generation
·4249 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
Unified 3D latent representation (SLAT) enables versatile high-quality 3D asset generation, significantly outperforming existing methods.
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
·2297 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China
One-shot image to realistic, animatable talking avatar! Novel pipeline uses diffusion models and a hybrid 3DGS-mesh representation, achieving seamless generalization and precise control.
AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos
·2678 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
AlphaTablets: A novel 3D plane representation enabling accurate, consistent, and flexible 3D planar reconstruction from monocular videos, achieving state-of-the-art results.
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
·4458 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent PCG
Make-It-Animatable: Instantly create animation-ready 3D characters, regardless of pose or shape, using a novel data-driven framework.
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
·3896 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Google DeepMind
CAT4D: Create realistic 4D scenes from single-view videos using a novel multi-view video diffusion model.
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
·4827 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 DFKI
MARVEL-40M+ & MARVEL-FX3D: 40M+ high-quality 3D annotations & a fast two-stage text-to-3D pipeline enabling high-fidelity 3D model generation within 15 seconds.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
·3638 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Twelve Labs
SplatFlow: A novel multi-view rectified flow model enabling direct 3D Gaussian splatting generation & training-free editing for diverse 3D tasks.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
·2778 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanyang Technological University
SAR3D: Blazing-fast autoregressive 3D object generation and understanding using a multi-scale VQVAE, achieving sub-second generation and detailed multimodal comprehension.
Learning 3D Representations from Procedural 3D Programs
·4094 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Virginia
Self-supervised learning of 3D representations from procedurally generated synthetic shapes achieves comparable performance to models trained on real-world datasets, highlighting the potential of synt…