3D Vision

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

26 September 2024·1812 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Zhejiang University

Depth-range-free MVS network using pose embedding achieves robust and accurate 3D reconstruction.

A General Protocol to Probe Large Vision Models for 3D Physical Understanding

26 September 2024·4012 words·19 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Oxford

Researchers developed a lightweight protocol to probe large vision models’ 3D physical understanding by training classifiers on model features for various scene properties (geometry, material, lightin…

A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration

26 September 2024·2500 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Zhejiang University

CAST: a novel consistency-aware spot-guided Transformer achieves state-of-the-art accuracy and efficiency in point cloud registration.

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

26 September 2024·2302 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Beihang University

4Diffusion generates high-quality, temporally consistent 4D content from monocular videos using a unified multi-view diffusion model and novel loss functions.

4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

26 September 2024·1909 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Seoul National University

Uncertainty-aware 4D Gaussian Splatting enhances dynamic scene reconstruction from monocular videos by selectively applying regularization to uncertain regions, improving both novel view synthesis and…

3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors

26 September 2024·2090 words·10 mins· loading · loading

3D Vision 🏢 Clemson University

3DGS-Enhancer boosts unbounded 3D Gaussian splatting, generating high-fidelity novel views even with sparse input data using view-consistent 2D diffusion priors.

3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection

26 September 2024·1690 words·8 mins· loading · loading

Computer Vision 3D Vision 🏢 Fudan University

3DET-Mamba: A novel end-to-end 3D object detector leveraging the Mamba state space model for efficient and accurate object detection in complex indoor scenes, outperforming previous 3DETR models.

3D Gaussian Splatting as Markov Chain Monte Carlo

26 September 2024·1616 words·8 mins· loading · loading

3D Vision 🏢 University of British Columbia

Researchers rethink 3D Gaussian Splatting as MCMC sampling, improving rendering quality and Gaussian control via a novel relocation strategy.

3D Gaussian Rendering Can Be Sparser: Efficient Rendering via Learned Fragment Pruning

26 September 2024·1720 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Georgia Institute of Technology

Learned fragment pruning accelerates 3D Gaussian splatting rendering by selectively removing fragments, achieving up to 1.71x speedup on edge GPUs and 0.16 PSNR improvement.

3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration

26 September 2024·1762 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Northwestern Polytechnical University

3DFMNet: A novel two-stage network for multi-instance point cloud registration, achieving state-of-the-art accuracy by focusing on object centers first and then performing pairwise registration.

3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction

26 September 2024·2707 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Pohang University of Science and Technology

3D pose estimation is revolutionized by a novel SO(3)-equivariant network directly predicting Wigner-D harmonics, achieving state-of-the-art accuracy and efficiency.

$SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation

26 September 2024·2436 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Toyota Research Institute

SE(3)-equivariant ray embeddings in Perceiver IO achieve state-of-the-art implicit multi-view depth estimation, surpassing methods that rely on data augmentation for approximate equivariance.

$ ext{Di}^2 ext{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

26 September 2024·2529 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

Di²Pose, a novel discrete diffusion model, tackles occluded 3D human pose estimation by employing a two-stage process: pose quantization and discrete diffusion, achieving state-of-the-art results.