3D Vision

Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens

20 March 2025·3099 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 DP Technology

Uni-3DAR: Autoregressive framework unifies 3D generation/understanding, compressing spatial tokens for faster, versatile AI.

Sonata: Self-Supervised Learning of Reliable Point Representations

20 March 2025·2429 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Hong Kong

Sonata: Reliable 3D point cloud self-supervised learning through self-distillation, achieving SOTA with less data.

NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes

20 March 2025·4268 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Simon Fraser University

NuiScene: Enables efficient & unbounded outdoor scene generation by encoding scene chunks as uniform vector sets and outpainting.

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

20 March 2025·3624 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Copenhagen

GFS-VL: Enhancing few-shot 3D segmentation by synergizing vision-language models with few-shot learning for robust real-world application.

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering

20 March 2025·3897 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore

4DGS-1K: Achieves 1000+ FPS for dynamic scene rendering via a compact, memory-efficient framework, offering a 41x storage reduction and 9x faster speed.

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

19 March 2025·2721 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University

DeepMesh: RL-guided auto-regressive creation of artist-quality 3D meshes, enhanced by tokenization & DPO for human-aligned aesthetics.

Cube: A Roblox View of 3D Intelligence

19 March 2025·2896 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Roblox

Roblox presents Cube, a 3D intelligence model using shape tokenization for text-to-shape, shape-to-text, and text-to-scene generation.

WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

17 March 2025·1935 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Peking University

WideRange4D: A new benchmark & reconstruction method for high-quality 4D scenes with wide-range movements, pushing the boundaries of 4D reconstruction.

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

17 March 2025·5602 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Apple

MM-Spatial enhances multimodal LLMs with 3D spatial reasoning via a novel dataset and benchmark, improving performance on spatial understanding tasks.

Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation

17 March 2025·2576 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai Artificial Intelligence Laboratory

Infinite Mobility: Procedural generation of high-fidelity articulated objects for scalable embodied AI training.

VGGT: Visual Geometry Grounded Transformer

14 March 2025·3346 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Oxford

VGGT: a fast, end-to-end transformer that infers complete 3D scene attributes from multiple views, outperforming optimization-based methods.

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

13 March 2025·2424 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Alibaba Group

LHM: Animatable 3D avatars from a single image in seconds.

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

13 March 2025·2550 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Westlake University

ETCH: Equivariantly fitting bodies to clothed humans through tightness for better pose and shape accuracy.

MaRI: Material Retrieval Integration across Domains

11 March 2025·2119 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Electronic Science and Technology of China

MaRI: Accurately retrieves textures from images by bridging the gap between visual representations and material properties across diverse domains.

PE3R: Perception-Efficient 3D Reconstruction

10 March 2025·2061 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore

PE3R: Achieves fast and accurate 3D scene reconstruction from 2D images by enhanced perception and efficiency.

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

3 March 2025·2689 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 HKUST(GZ)

Kiss3DGen generates 3D assets by repurposing 2D diffusion models, enabling efficient 3D editing and enhancement.

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

3 March 2025·2982 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 NVIDIA

DIFIX3D+ improves 3D reconstructions by reducing artifacts via single-step diffusion models, enhancing novel-view synthesis quality and consistency.

Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling

27 February 2025·3037 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore

EDGS: Achieves faster, high-quality dynamic scene rendering by sparse time-variant attribute modeling and intelligent static area filtering.

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

26 February 2025·3674 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University

ArtGS: Achieves state-of-the-art, efficient interactable replicas of complex articulated objects via Gaussian Splatting.

MagicArticulate: Make Your 3D Models Articulation-Ready

17 February 2025·4321 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanyang Technological University

MagicArticulate automates 3D model animation preparation by generating skeletons and skinning weights, overcoming prior manual methods’ limitations, and introducing Articulation-XL, a large-scale benc…