Computer Vision

A Closer Look at the CLS Token for Cross-Domain Few-Shot Learning

26 September 2024·3699 words·18 mins· loading · loading

AI Generated Computer Vision Few-Shot Learning 🏢 Huazhong University of Science and Technology

Leaving the CLS token of a Vision Transformer randomly initialized during cross-domain few-shot learning consistently improves performance; a novel method leveraging this phenomenon achieves state-of-…

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

26 September 2024·1721 words·9 mins· loading · loading

Computer Vision Video Understanding 🏢 Snap Inc.

4Real: Photorealistic 4D scene generation from text prompts using video diffusion models, exceeding object-centric approaches for higher realism and efficiency.

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

26 September 2024·2302 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Beihang University

4Diffusion generates high-quality, temporally consistent 4D content from monocular videos using a unified multi-view diffusion model and novel loss functions.

4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

26 September 2024·1909 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Seoul National University

Uncertainty-aware 4D Gaussian Splatting enhances dynamic scene reconstruction from monocular videos by selectively applying regularization to uncertain regions, improving both novel view synthesis and…

3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection

26 September 2024·1690 words·8 mins· loading · loading

Computer Vision 3D Vision 🏢 Fudan University

3DET-Mamba: A novel end-to-end 3D object detector leveraging the Mamba state space model for efficient and accurate object detection in complex indoor scenes, outperforming previous 3DETR models.

3D Gaussian Rendering Can Be Sparser: Efficient Rendering via Learned Fragment Pruning

26 September 2024·1720 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Georgia Institute of Technology

Learned fragment pruning accelerates 3D Gaussian splatting rendering by selectively removing fragments, achieving up to 1.71x speedup on edge GPUs and 0.16 PSNR improvement.

3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration

26 September 2024·1762 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Northwestern Polytechnical University

3DFMNet: A novel two-stage network for multi-instance point cloud registration, achieving state-of-the-art accuracy by focusing on object centers first and then performing pairwise registration.

3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction

26 September 2024·2707 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Pohang University of Science and Technology

3D pose estimation is revolutionized by a novel SO(3)-equivariant network directly predicting Wigner-D harmonics, achieving state-of-the-art accuracy and efficiency.

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

26 September 2024·2009 words·10 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

2DQuant achieves highly efficient and accurate low-bit image super-resolution by using a dual-stage post-training quantization method that minimizes accuracy loss in transformer-based models, surpassi…

$SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation

26 September 2024·2436 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Toyota Research Institute

SE(3)-equivariant ray embeddings in Perceiver IO achieve state-of-the-art implicit multi-view depth estimation, surpassing methods that rely on data augmentation for approximate equivariance.

$ ext{ID}^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition

26 September 2024·1939 words·10 mins· loading · loading

Computer Vision Face Recognition 🏢 Tencent Youtu Lab

ID³: A novel diffusion model generates diverse, identity-preserving synthetic face datasets for accurate and privacy-preserving face recognition, exceeding current state-of-the-art methods.

$ ext{Di}^2 ext{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

26 September 2024·2529 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

Di²Pose, a novel discrete diffusion model, tackles occluded 3D human pose estimation by employing a two-stage process: pose quantization and discrete diffusion, achieving state-of-the-art results.