Skip to main content

3D Vision

Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
·2702 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University of Hong Kong, Shenzhen
Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging.
X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
·2612 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 the Chinese University of Hong Kong
X2-Gaussian enables continuous-time 4D CT reconstruction via dynamic radiative Gaussian splatting and self-supervised respiratory motion learning.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
·2163 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
SparseFlex: Achieves high-res, arbitrary-topology 3D shape modeling via sparse isosurface representation and sectional voxel training. Revolutionizing 3D generative AI!
Reconstructing Humans with a Biomechanically Accurate Skeleton
·2828 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Texas at Austin
HSMR: Reconstructing 3D humans with a biomechanically accurate skeleton model from a single image, enhancing pose realism.
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
·2359 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology
Free4D: Tuning-free 4D scene generation with spatial-temporal consistency.
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
·4642 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 UCLA
Feature4X: 4D Agentic AI from Monocular Video w/ Gaussian Feature Fields
DINeMo: Learning Neural Mesh Models with no 3D Annotations
·1595 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Johns Hopkins University
DINeMo: Learns 3D models with no 3D annotations, leveraging pseudo-correspondence from visual foundation models for enhanced pose estimation.
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
·3848 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Australian National University
FRESA: fast feedforward 3D personalized avatar creation from few images.
Aether: Geometric-Aware Unified World Modeling
·2472 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai AI Laboratory
AETHER: a unified framework enabling geometry-aware reasoning in world models, achieving zero-shot generalization from synthetic to real-world data.
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
·3002 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Alibaba Group
TaoAvatar: Lifelike talking avatars in AR, using 3D Gaussian Splatting for real-time rendering and high fidelity.
Optimized Minimal 3D Gaussian Splatting
·3465 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Sungkyunkwan University
OMG: optimized minimal 3D Gaussian splatting, enabling fast and efficient rendering with minimal storage.
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
·2762 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Oxford
Motion blur, usually a problem, is now a solution! This paper estimates camera motion from motion-blurred images, acting like an IMU.
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields
·2851 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 KAIST, Visual Media Lab
FFaceNeRF: Enables few-shot face editing in NeRFs via geometry adapter & latent mixing, enhancing control & quality with limited training data.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
·2987 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
Zero-1-to-A: Animatable avatars from a single image using video diffusion, robust to spatial & temporal inconsistencies!
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
·1204 words·6 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 EverEx
VideoRFSplat: Direct text-to-3D Gaussian Splatting with flexible pose and multi-view joint modeling, bypassing SDS refinement!
Unleashing Vecset Diffusion Model for Fast Shape Generation
·3881 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 MMLab, CUHK
FlashVDM enables fast 3D shape generation by accelerating both VAE decoding and diffusion sampling.
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
·3099 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 DP Technology
Uni-3DAR: Autoregressive framework unifies 3D generation/understanding, compressing spatial tokens for faster, versatile AI.
Sonata: Self-Supervised Learning of Reliable Point Representations
·2429 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Hong Kong
Sonata: Reliable 3D point cloud self-supervised learning through self-distillation, achieving SOTA with less data.
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
·4268 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Simon Fraser University
NuiScene: Enables efficient & unbounded outdoor scene generation by encoding scene chunks as uniform vector sets and outpainting.
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
·3624 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Copenhagen
GFS-VL: Enhancing few-shot 3D segmentation by synergizing vision-language models with few-shot learning for robust real-world application.