Skip to main content

3D Vision

Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
·3099 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 DP Technology
Uni-3DAR: Autoregressive framework unifies 3D generation/understanding, compressing spatial tokens for faster, versatile AI.
Sonata: Self-Supervised Learning of Reliable Point Representations
·2429 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Hong Kong
Sonata: Reliable 3D point cloud self-supervised learning through self-distillation, achieving SOTA with less data.
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
·4268 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Simon Fraser University
NuiScene: Enables efficient & unbounded outdoor scene generation by encoding scene chunks as uniform vector sets and outpainting.
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
·3624 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Copenhagen
GFS-VL: Enhancing few-shot 3D segmentation by synergizing vision-language models with few-shot learning for robust real-world application.
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
·3897 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore
4DGS-1K: Achieves 1000+ FPS for dynamic scene rendering via a compact, memory-efficient framework, offering a 41x storage reduction and 9x faster speed.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
·2721 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
DeepMesh: RL-guided auto-regressive creation of artist-quality 3D meshes, enhanced by tokenization & DPO for human-aligned aesthetics.
Cube: A Roblox View of 3D Intelligence
·2896 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Roblox
Roblox presents Cube, a 3D intelligence model using shape tokenization for text-to-shape, shape-to-text, and text-to-scene generation.
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
·1935 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Peking University
WideRange4D: A new benchmark & reconstruction method for high-quality 4D scenes with wide-range movements, pushing the boundaries of 4D reconstruction.
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
·5602 words·27 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Apple
MM-Spatial enhances multimodal LLMs with 3D spatial reasoning via a novel dataset and benchmark, improving performance on spatial understanding tasks.
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation
·2576 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai Artificial Intelligence Laboratory
Infinite Mobility: Procedural generation of high-fidelity articulated objects for scalable embodied AI training.
VGGT: Visual Geometry Grounded Transformer
·3346 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Oxford
VGGT: a fast, end-to-end transformer that infers complete 3D scene attributes from multiple views, outperforming optimization-based methods.
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
·2424 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Alibaba Group
LHM: Animatable 3D avatars from a single image in seconds.
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
·2550 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Westlake University
ETCH: Equivariantly fitting bodies to clothed humans through tightness for better pose and shape accuracy.
MaRI: Material Retrieval Integration across Domains
·2119 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Electronic Science and Technology of China
MaRI: Accurately retrieves textures from images by bridging the gap between visual representations and material properties across diverse domains.
PE3R: Perception-Efficient 3D Reconstruction
·2061 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore
PE3R: Achieves fast and accurate 3D scene reconstruction from 2D images by enhanced perception and efficiency.
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
·2689 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 HKUST(GZ)
Kiss3DGen generates 3D assets by repurposing 2D diffusion models, enabling efficient 3D editing and enhancement.
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
·2982 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 NVIDIA
DIFIX3D+ improves 3D reconstructions by reducing artifacts via single-step diffusion models, enhancing novel-view synthesis quality and consistency.
Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling
·3037 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore
EDGS: Achieves faster, high-quality dynamic scene rendering by sparse time-variant attribute modeling and intelligent static area filtering.
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
·3674 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
ArtGS: Achieves state-of-the-art, efficient interactable replicas of complex articulated objects via Gaussian Splatting.
MagicArticulate: Make Your 3D Models Articulation-Ready
·4321 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanyang Technological University
MagicArticulate automates 3D model animation preparation by generating skeletons and skinning weights, overcoming prior manual methods’ limitations, and introducing Articulation-XL, a large-scale benc…