Skip to main content

3D Vision

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain
·2252 words·11 mins· loading · loading
Computer Vision 3D Vision 🏢 Nanjing University of Science and Technology
DCDepth achieves state-of-the-art monocular depth estimation by progressively predicting depth in the frequency domain via DCT, capturing local correlations and global context effectively.
DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos
·2153 words·11 mins· loading · loading
Computer Vision 3D Vision 🏢 Virginia Tech
DC-Gaussian: A novel method generates high-fidelity novel views from dashcam videos by addressing common windshield obstructions (reflections, occlusions) using adaptive image decomposition, illumina…
CryoSPIN: Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference
·1731 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 University of Toronto
CryoSPIN revolutionizes ab-initio cryo-EM reconstruction with semi-amortized pose inference, achieving faster and more accurate 3D structure determination.
CRAYM: Neural Field Optimization via Camera RAY Matching
·2649 words·13 mins· loading · loading
Computer Vision 3D Vision 🏢 Shenzhen University
CRAYM: Neural field optimization via camera RAY matching enhances 3D reconstruction by using camera rays, not pixels, improving both novel view synthesis and geometry.
Continuous Heatmap Regression for Pose Estimation via Implicit Neural Representation
·2522 words·12 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Nanjing University of Science and Technology
NerPE: continuous heatmap regression via implicit neural representation resolves the accuracy-limiting quantization errors in human pose estimation, achieving sub-pixel precision.
ContextGS : Compact 3D Gaussian Splatting with Anchor Level Context Model
·1913 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Nanyang Technological University
ContextGS: Revolutionizing 3D scene compression with an anchor-level autoregressive model, achieving 15x size reduction in 3D Gaussian Splatting while boosting rendering quality.
Context and Geometry Aware Voxel Transformer for Semantic Scene Completion
·2245 words·11 mins· loading · loading
3D Vision 🏢 Zhejiang University
CGFormer: a novel voxel transformer boosting semantic scene completion accuracy by using context-aware queries and 3D deformable attention, outperforming existing methods on SemanticKITTI and SSCBench…
ContactField: Implicit Field Representation for Multi-Person Interaction Geometry
·3542 words·17 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Electronics and Telecommunications Research Institute
Novel implicit field representation accurately reconstructs multi-person interaction geometry in 3D, simultaneously capturing occupancy, instance IDs, and contact fields, surpassing existing methods.
CoFie: Learning Compact Neural Surface Representations with Coordinate Fields
·2625 words·13 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 University of Texas at Austin
CoFie: A novel local geometry-aware neural surface representation dramatically improves accuracy and efficiency in 3D shape modeling by using coordinate fields to compress local shape information.
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
·1770 words·9 mins· loading · loading
3D Vision 🏢 Google DeepMind
CAT3D: Generate high-quality 3D scenes from as little as one image using a novel multi-view diffusion model, outperforming existing methods in speed and quality.
Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis
·2801 words·14 mins· loading · loading
Computer Vision 3D Vision 🏢 Tsinghua University
Binocular-guided 3D Gaussian splatting with self-supervision generates high-quality novel views from sparse inputs without external priors, significantly outperforming state-of-the-art methods.
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
·3663 words·18 mins· loading · loading
Computer Vision 3D Vision 🏢 ETH Zurich
BetterDepth: A plug-and-play diffusion refiner boosts zero-shot monocular depth estimation by adding fine details while preserving accurate geometry.
Assembly Fuzzy Representation on Hypergraph for Open-Set 3D Object Retrieval
·2034 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 Tsinghua University
Hypergraph-Based Assembly Fuzzy Representation (HAFR) excels at open-set 3D object retrieval by using part-level shapes and fuzzy representations to overcome challenges posed by unseen object categori…
Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis
·3848 words·19 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 University of Edinburgh
Unsupervised Articulated Object Modeling using Conditional View Synthesis learns pose and part segmentation from only two object observations, achieving significantly better performance than previous …
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
·2384 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 DAMO Academy, Alibaba Group
Animate3D animates any 3D model using multi-view video diffusion, achieving superior spatiotemporal consistency and straightforward mesh animation.
AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos
·2409 words·12 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Tsinghua University
AlphaTablets revolutionizes 3D planar reconstruction from monocular videos with its novel rectangle-based representation featuring continuous surfaces and precise boundaries, achieving state-of-the-ar…
Activating Self-Attention for Multi-Scene Absolute Pose Regression
·2130 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 SungKyunKwan University
Boosting Multi-Scene Pose Regression: Novel methods activate transformer self-attention, significantly improving camera pose estimation accuracy and efficiency.
A Unified Framework for 3D Scene Understanding
·2347 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology
UniSeg3D: One model to rule them all! This unified framework masters six 3D segmentation tasks (panoptic, semantic, instance, interactive, referring, and open-vocabulary) simultaneously, outperforming…
A Simple yet Universal Framework for Depth Completion
·2167 words·11 mins· loading · loading
Computer Vision 3D Vision 🏢 AI Graduate School GIST
UniDC framework achieves universal depth completion across various sensors and scenes using minimal labeled data, leveraging a foundation model and hyperbolic embedding for enhanced generalization.
A robust inlier identification algorithm for point cloud registration via l_0-minimization
·2507 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology
This paper introduces a novel, robust inlier identification algorithm for point cloud registration that leverages lo-minimization.