Computer Vision

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

26 September 2024·2196 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

OneDet3D: A universal 3D object detector trained jointly on diverse indoor/outdoor datasets, achieving one-for-all performance across domains and categories.

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

26 September 2024·2971 words·14 mins· loading · loading

Computer Vision Image Classification 🏢 Carnegie Mellon University

Vision Transformers achieve surprisingly high accuracy by transferring only pre-training attention maps, challenging the conventional belief that feature learning is crucial.

On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection

26 September 2024·2133 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Shanghai Jiao Tong University

MM-Det, a novel algorithm, uses multimodal learning and spatiotemporal attention to detect diffusion-generated videos, achieving state-of-the-art performance on the new DVF dataset.

On improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models

26 September 2024·3235 words·16 mins· loading · loading

Computer Vision Image Generation 🏢 FAIR at Meta

Researchers achieve state-of-the-art image generation by disentangling semantic and control metadata in diffusion models and optimizing pre-training across resolutions.

ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings

26 September 2024·1959 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Dept. of ECE & ASRI

ODGS: Lightning-fast 3D scene reconstruction from single omnidirectional images using 3D Gaussian splatting, achieving 100x speedup over NeRF-based methods.

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

26 September 2024·4424 words·21 mins· loading · loading

Computer Vision Object Detection 🏢 Tsinghua University

ODGEN: Boosting object detection accuracy by generating high-quality synthetic images using diffusion models conditioned on bounding boxes and text descriptions.

OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries

26 September 2024·2593 words·13 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 ShanghaiTech University

OctreeOcc uses octree queries for efficient and multi-granularity 3D occupancy prediction, surpassing state-of-the-art methods with reduced computational costs.

OccFusion: Rendering Occluded Humans with Generative Diffusion Priors

26 September 2024·2014 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Stanford University

OccFusion: High-fidelity human rendering from videos, even with occlusions, using 3D Gaussian splatting and 2D diffusion priors.

Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli

26 September 2024·2181 words·11 mins· loading · loading

Computer Vision Image Segmentation 🏢 University of Tübingen

Neuroscience-inspired motion energy processing enables human-like zero-shot generalization in figure-ground segmentation, outperforming deep learning models on random dot stimuli.

NVRC: Neural Video Representation Compression

26 September 2024·1996 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Visual Information Lab, University of Bristol, UK

NVRC: A novel end-to-end neural video codec achieves 23% coding gain over VVC VTM by optimizing representation compression.

Not Just Object, But State: Compositional Incremental Learning without Forgetting

26 September 2024·2423 words·12 mins· loading · loading

Computer Vision Image Classification 🏢 Dalian University of Technology

CompILer: A novel prompt-based incremental learner mastering state-object compositions without forgetting, achieving state-of-the-art performance.

Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering

26 September 2024·2264 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Monash University

Normal-GS improves 3D Gaussian Splatting by integrating normal vectors into the rendering pipeline, achieving near state-of-the-art visual quality with accurate surface normals in real-time.

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations

26 September 2024·3800 words·18 mins· loading · loading

Computer Vision Image Classification 🏢 QUVA Lab, University of Amsterdam

Self-supervised gradients boost frozen deep learning model performance!

NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation

26 September 2024·2258 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 PhiGent Robotics

NeuroGauss4D-PCI masters complex point cloud interpolation using 4D neural fields and Gaussian deformation fields, achieving superior accuracy in dynamic scenes.

NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction

26 September 2024·2947 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 Shanghai Jiao Tong University

NeuRodin: A two-stage neural framework achieves high-fidelity 3D surface reconstruction from posed RGB images by innovatively addressing limitations in SDF-based methods, resulting in superior reconst…

Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set

26 September 2024·2791 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

Neural SDF inference is revolutionized by dynamically aligning 3D Gaussians to a neural SDF’s zero-level set, enabling accurate, smooth 3D surface reconstruction.

Neural Residual Diffusion Models for Deep Scalable Vision Generation

26 September 2024·1912 words·9 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Tsinghua University

Neural-RDM: A novel framework for deep, scalable vision generation using residual diffusion models, achieving state-of-the-art results on image and video benchmarks.

Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses

26 September 2024·3744 words·18 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 KAIST

Learn disentangled 3D object poses and transfer them between different object identities using a novel neural pose representation, boosting 3D shape generation!

Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation

26 September 2024·2789 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Tübingen

Neural Localizer Fields (NLF) revolutionizes 3D human pose and shape estimation by learning a continuous field of point localizer functions, enabling flexible training on diverse data and on-the-fly p…

Neural Isometries: Taming Transformations for Equivariant ML

26 September 2024·2578 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 PlayStation

Neural Isometries learns a latent space where geometric relationships in the observation space are represented as isometries in the latent space, enabling efficient handling of complex symmetries and …