Skip to main content

Computer Vision

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
·2196 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 Tsinghua University
OneDet3D: A universal 3D object detector trained jointly on diverse indoor/outdoor datasets, achieving one-for-all performance across domains and categories.
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
·2971 words·14 mins· loading · loading
Computer Vision Image Classification 🏒 Carnegie Mellon University
Vision Transformers achieve surprisingly high accuracy by transferring only pre-training attention maps, challenging the conventional belief that feature learning is crucial.
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
·2133 words·11 mins· loading · loading
Computer Vision Video Understanding 🏒 Shanghai Jiao Tong University
MM-Det, a novel algorithm, uses multimodal learning and spatiotemporal attention to detect diffusion-generated videos, achieving state-of-the-art performance on the new DVF dataset.
On improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
·3235 words·16 mins· loading · loading
Computer Vision Image Generation 🏒 FAIR at Meta
Researchers achieve state-of-the-art image generation by disentangling semantic and control metadata in diffusion models and optimizing pre-training across resolutions.
ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings
·1959 words·10 mins· loading · loading
Computer Vision 3D Vision 🏒 Dept. of ECE & ASRI
ODGS: Lightning-fast 3D scene reconstruction from single omnidirectional images using 3D Gaussian splatting, achieving 100x speedup over NeRF-based methods.
ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
·4424 words·21 mins· loading · loading
Computer Vision Object Detection 🏒 Tsinghua University
ODGEN: Boosting object detection accuracy by generating high-quality synthetic images using diffusion models conditioned on bounding boxes and text descriptions.
OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries
·2593 words·13 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏒 ShanghaiTech University
OctreeOcc uses octree queries for efficient and multi-granularity 3D occupancy prediction, surpassing state-of-the-art methods with reduced computational costs.
OccFusion: Rendering Occluded Humans with Generative Diffusion Priors
·2014 words·10 mins· loading · loading
Computer Vision 3D Vision 🏒 Stanford University
OccFusion: High-fidelity human rendering from videos, even with occlusions, using 3D Gaussian splatting and 2D diffusion priors.
Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli
·2181 words·11 mins· loading · loading
Computer Vision Image Segmentation 🏒 University of Tübingen
Neuroscience-inspired motion energy processing enables human-like zero-shot generalization in figure-ground segmentation, outperforming deep learning models on random dot stimuli.
NVRC: Neural Video Representation Compression
·1996 words·10 mins· loading · loading
Computer Vision Video Understanding 🏒 Visual Information Lab, University of Bristol, UK
NVRC: A novel end-to-end neural video codec achieves 23% coding gain over VVC VTM by optimizing representation compression.
Not Just Object, But State: Compositional Incremental Learning without Forgetting
·2423 words·12 mins· loading · loading
Computer Vision Image Classification 🏒 Dalian University of Technology
CompILer: A novel prompt-based incremental learner mastering state-object compositions without forgetting, achieving state-of-the-art performance.
Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering
·2264 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 Monash University
Normal-GS improves 3D Gaussian Splatting by integrating normal vectors into the rendering pipeline, achieving near state-of-the-art visual quality with accurate surface normals in real-time.
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
·3800 words·18 mins· loading · loading
Computer Vision Image Classification 🏒 QUVA Lab, University of Amsterdam
Self-supervised gradients boost frozen deep learning model performance!
NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation
·2258 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 PhiGent Robotics
NeuroGauss4D-PCI masters complex point cloud interpolation using 4D neural fields and Gaussian deformation fields, achieving superior accuracy in dynamic scenes.
NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
·2947 words·14 mins· loading · loading
Computer Vision 3D Vision 🏒 Shanghai Jiao Tong University
NeuRodin: A two-stage neural framework achieves high-fidelity 3D surface reconstruction from posed RGB images by innovatively addressing limitations in SDF-based methods, resulting in superior reconst…
Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set
·2791 words·14 mins· loading · loading
Computer Vision 3D Vision 🏒 Tsinghua University
Neural SDF inference is revolutionized by dynamically aligning 3D Gaussians to a neural SDF’s zero-level set, enabling accurate, smooth 3D surface reconstruction.
Neural Residual Diffusion Models for Deep Scalable Vision Generation
·1912 words·9 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 Tsinghua University
Neural-RDM: A novel framework for deep, scalable vision generation using residual diffusion models, achieving state-of-the-art results on image and video benchmarks.
Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses
·3744 words·18 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏒 KAIST
Learn disentangled 3D object poses and transfer them between different object identities using a novel neural pose representation, boosting 3D shape generation!
Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation
·2789 words·14 mins· loading · loading
Computer Vision 3D Vision 🏒 University of Tübingen
Neural Localizer Fields (NLF) revolutionizes 3D human pose and shape estimation by learning a continuous field of point localizer functions, enabling flexible training on diverse data and on-the-fly p…
Neural Isometries: Taming Transformations for Equivariant ML
·2578 words·13 mins· loading · loading
Computer Vision 3D Vision 🏒 PlayStation
Neural Isometries learns a latent space where geometric relationships in the observation space are represented as isometries in the latent space, enabling efficient handling of complex symmetries and …