Skip to main content

Computer Vision

A Closer Look at the CLS Token for Cross-Domain Few-Shot Learning
·3699 words·18 mins· loading · loading
AI Generated Computer Vision Few-Shot Learning 🏢 Huazhong University of Science and Technology
Leaving the CLS token of a Vision Transformer randomly initialized during cross-domain few-shot learning consistently improves performance; a novel method leveraging this phenomenon achieves state-of-…
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
·1721 words·9 mins· loading · loading
Computer Vision Video Understanding 🏢 Snap Inc.
4Real: Photorealistic 4D scene generation from text prompts using video diffusion models, exceeding object-centric approaches for higher realism and efficiency.
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
·2302 words·11 mins· loading · loading
Computer Vision 3D Vision 🏢 Beihang University
4Diffusion generates high-quality, temporally consistent 4D content from monocular videos using a unified multi-view diffusion model and novel loss functions.
4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization
·1909 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Seoul National University
Uncertainty-aware 4D Gaussian Splatting enhances dynamic scene reconstruction from monocular videos by selectively applying regularization to uncertain regions, improving both novel view synthesis and…
3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection
·1690 words·8 mins· loading · loading
Computer Vision 3D Vision 🏢 Fudan University
3DET-Mamba: A novel end-to-end 3D object detector leveraging the Mamba state space model for efficient and accurate object detection in complex indoor scenes, outperforming previous 3DETR models.
3D Gaussian Rendering Can Be Sparser: Efficient Rendering via Learned Fragment Pruning
·1720 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Georgia Institute of Technology
Learned fragment pruning accelerates 3D Gaussian splatting rendering by selectively removing fragments, achieving up to 1.71x speedup on edge GPUs and 0.16 PSNR improvement.
3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration
·1762 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Northwestern Polytechnical University
3DFMNet: A novel two-stage network for multi-instance point cloud registration, achieving state-of-the-art accuracy by focusing on object centers first and then performing pairwise registration.
3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction
·2707 words·13 mins· loading · loading
Computer Vision 3D Vision 🏢 Pohang University of Science and Technology
3D pose estimation is revolutionized by a novel SO(3)-equivariant network directly predicting Wigner-D harmonics, achieving state-of-the-art accuracy and efficiency.
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution
·2009 words·10 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Shanghai Jiao Tong University
2DQuant achieves highly efficient and accurate low-bit image super-resolution by using a dual-stage post-training quantization method that minimizes accuracy loss in transformer-based models, surpassi…
$SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation
·2436 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Toyota Research Institute
SE(3)-equivariant ray embeddings in Perceiver IO achieve state-of-the-art implicit multi-view depth estimation, surpassing methods that rely on data augmentation for approximate equivariance.
$ ext{ID}^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition
·1939 words·10 mins· loading · loading
Computer Vision Face Recognition 🏢 Tencent Youtu Lab
ID³: A novel diffusion model generates diverse, identity-preserving synthetic face datasets for accurate and privacy-preserving face recognition, exceeding current state-of-the-art methods.
$ ext{Di}^2 ext{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
·2529 words·12 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
Di²Pose, a novel discrete diffusion model, tackles occluded 3D human pose estimation by employing a two-stage process: pose quantization and discrete diffusion, achieving state-of-the-art results.