Skip to main content

3D Vision

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
·1812 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Zhejiang University
Depth-range-free MVS network using pose embedding achieves robust and accurate 3D reconstruction.
A General Protocol to Probe Large Vision Models for 3D Physical Understanding
·4012 words·19 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 University of Oxford
Researchers developed a lightweight protocol to probe large vision models’ 3D physical understanding by training classifiers on model features for various scene properties (geometry, material, lightin…
A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration
·2500 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Zhejiang University
CAST: a novel consistency-aware spot-guided Transformer achieves state-of-the-art accuracy and efficiency in point cloud registration.
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
·2302 words·11 mins· loading · loading
Computer Vision 3D Vision 🏢 Beihang University
4Diffusion generates high-quality, temporally consistent 4D content from monocular videos using a unified multi-view diffusion model and novel loss functions.
4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization
·1909 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Seoul National University
Uncertainty-aware 4D Gaussian Splatting enhances dynamic scene reconstruction from monocular videos by selectively applying regularization to uncertain regions, improving both novel view synthesis and…
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors
·2090 words·10 mins· loading · loading
3D Vision 🏢 Clemson University
3DGS-Enhancer boosts unbounded 3D Gaussian splatting, generating high-fidelity novel views even with sparse input data using view-consistent 2D diffusion priors.
3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection
·1690 words·8 mins· loading · loading
Computer Vision 3D Vision 🏢 Fudan University
3DET-Mamba: A novel end-to-end 3D object detector leveraging the Mamba state space model for efficient and accurate object detection in complex indoor scenes, outperforming previous 3DETR models.
3D Gaussian Splatting as Markov Chain Monte Carlo
·1616 words·8 mins· loading · loading
3D Vision 🏢 University of British Columbia
Researchers rethink 3D Gaussian Splatting as MCMC sampling, improving rendering quality and Gaussian control via a novel relocation strategy.
3D Gaussian Rendering Can Be Sparser: Efficient Rendering via Learned Fragment Pruning
·1720 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Georgia Institute of Technology
Learned fragment pruning accelerates 3D Gaussian splatting rendering by selectively removing fragments, achieving up to 1.71x speedup on edge GPUs and 0.16 PSNR improvement.
3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration
·1762 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Northwestern Polytechnical University
3DFMNet: A novel two-stage network for multi-instance point cloud registration, achieving state-of-the-art accuracy by focusing on object centers first and then performing pairwise registration.
3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction
·2707 words·13 mins· loading · loading
Computer Vision 3D Vision 🏢 Pohang University of Science and Technology
3D pose estimation is revolutionized by a novel SO(3)-equivariant network directly predicting Wigner-D harmonics, achieving state-of-the-art accuracy and efficiency.
$SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation
·2436 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Toyota Research Institute
SE(3)-equivariant ray embeddings in Perceiver IO achieve state-of-the-art implicit multi-view depth estimation, surpassing methods that rely on data augmentation for approximate equivariance.
$ ext{Di}^2 ext{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
·2529 words·12 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
Di²Pose, a novel discrete diffusion model, tackles occluded 3D human pose estimation by employing a two-stage process: pose quantization and discrete diffusion, achieving state-of-the-art results.