3D Vision

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

26 September 2024·2252 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Nanjing University of Science and Technology

DCDepth achieves state-of-the-art monocular depth estimation by progressively predicting depth in the frequency domain via DCT, capturing local correlations and global context effectively.

DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos

26 September 2024·2153 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Virginia Tech

DC-Gaussian: A novel method generates high-fidelity novel views from dashcam videos by addressing common windshield obstructions (reflections, occlusions) using adaptive image decomposition, illumina…

CryoSPIN: Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference

26 September 2024·1731 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Toronto

CryoSPIN revolutionizes ab-initio cryo-EM reconstruction with semi-amortized pose inference, achieving faster and more accurate 3D structure determination.

CRAYM: Neural Field Optimization via Camera RAY Matching

26 September 2024·2649 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Shenzhen University

CRAYM: Neural field optimization via camera RAY matching enhances 3D reconstruction by using camera rays, not pixels, improving both novel view synthesis and geometry.

Continuous Heatmap Regression for Pose Estimation via Implicit Neural Representation

26 September 2024·2522 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Nanjing University of Science and Technology

NerPE: continuous heatmap regression via implicit neural representation resolves the accuracy-limiting quantization errors in human pose estimation, achieving sub-pixel precision.

ContextGS : Compact 3D Gaussian Splatting with Anchor Level Context Model

26 September 2024·1913 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Nanyang Technological University

ContextGS: Revolutionizing 3D scene compression with an anchor-level autoregressive model, achieving 15x size reduction in 3D Gaussian Splatting while boosting rendering quality.

Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

26 September 2024·2245 words·11 mins· loading · loading

3D Vision 🏢 Zhejiang University

CGFormer: a novel voxel transformer boosting semantic scene completion accuracy by using context-aware queries and 3D deformable attention, outperforming existing methods on SemanticKITTI and SSCBench…

ContactField: Implicit Field Representation for Multi-Person Interaction Geometry

26 September 2024·3542 words·17 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Electronics and Telecommunications Research Institute

Novel implicit field representation accurately reconstructs multi-person interaction geometry in 3D, simultaneously capturing occupancy, instance IDs, and contact fields, surpassing existing methods.

CoFie: Learning Compact Neural Surface Representations with Coordinate Fields

26 September 2024·2625 words·13 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Texas at Austin

CoFie: A novel local geometry-aware neural surface representation dramatically improves accuracy and efficiency in 3D shape modeling by using coordinate fields to compress local shape information.

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

26 September 2024·1770 words·9 mins· loading · loading

3D Vision 🏢 Google DeepMind

CAT3D: Generate high-quality 3D scenes from as little as one image using a novel multi-view diffusion model, outperforming existing methods in speed and quality.

Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis

26 September 2024·2801 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

Binocular-guided 3D Gaussian splatting with self-supervision generates high-quality novel views from sparse inputs without external priors, significantly outperforming state-of-the-art methods.

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

26 September 2024·3663 words·18 mins· loading · loading

Computer Vision 3D Vision 🏢 ETH Zurich

BetterDepth: A plug-and-play diffusion refiner boosts zero-shot monocular depth estimation by adding fine details while preserving accurate geometry.

Assembly Fuzzy Representation on Hypergraph for Open-Set 3D Object Retrieval

26 September 2024·2034 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

Hypergraph-Based Assembly Fuzzy Representation (HAFR) excels at open-set 3D object retrieval by using part-level shapes and fuzzy representations to overcome challenges posed by unseen object categori…

Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

26 September 2024·3848 words·19 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Edinburgh

Unsupervised Articulated Object Modeling using Conditional View Synthesis learns pose and part segmentation from only two object observations, achieving significantly better performance than previous …

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

26 September 2024·2384 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 DAMO Academy, Alibaba Group

Animate3D animates any 3D model using multi-view video diffusion, achieving superior spatiotemporal consistency and straightforward mesh animation.

AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos

26 September 2024·2409 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tsinghua University

AlphaTablets revolutionizes 3D planar reconstruction from monocular videos with its novel rectangle-based representation featuring continuous surfaces and precise boundaries, achieving state-of-the-ar…

Activating Self-Attention for Multi-Scene Absolute Pose Regression

26 September 2024·2130 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 SungKyunKwan University

Boosting Multi-Scene Pose Regression: Novel methods activate transformer self-attention, significantly improving camera pose estimation accuracy and efficiency.

A Unified Framework for 3D Scene Understanding

26 September 2024·2347 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology

UniSeg3D: One model to rule them all! This unified framework masters six 3D segmentation tasks (panoptic, semantic, instance, interactive, referring, and open-vocabulary) simultaneously, outperforming…

A Simple yet Universal Framework for Depth Completion

26 September 2024·2167 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 AI Graduate School GIST

UniDC framework achieves universal depth completion across various sensors and scenes using minimal labeled data, leveraging a foundation model and hyperbolic embedding for enhanced generalization.

A robust inlier identification algorithm for point cloud registration via l_0-minimization

26 September 2024·2507 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology

This paper introduces a novel, robust inlier identification algorithm for point cloud registration that leverages lo-minimization.