Skip to main content

Computer Vision

DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
·2183 words·11 mins· loading · loading
Computer Vision Image Classification 🏒 Nokia Bell Labs
DEX boosts CNN accuracy on tiny AI accelerators by 3.5%p, utilizing unused memory and processors to extend input channels without increasing latency.
DeTrack: In-model Latent Denoising Learning for Visual Object Tracking
·2169 words·11 mins· loading · loading
Computer Vision Object Detection 🏒 School of Computer Science, Fudan University
DeTrack revolutionizes visual object tracking with an in-model latent denoising learning process, achieving real-time speed and state-of-the-art accuracy.
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
·2664 words·13 mins· loading · loading
Computer Vision 3D Vision 🏒 Stanford University
Depth Anywhere enhances 360-degree monocular depth estimation by cleverly using perspective models to label unlabeled 360-degree data, significantly improving accuracy.
Depth Anything V2
·3310 words·16 mins· loading · loading
Computer Vision 3D Vision 🏒 TikTok
Depth Anything V2 drastically improves monocular depth estimation by using synthetic training data, scaling up the teacher model, and employing pseudo-labeled real images. It outperforms previous met…
DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism
·2917 words·14 mins· loading · loading
AI Generated Computer Vision Image Classification 🏒 Samsung Electronics
DEPrune: A novel GPU-optimized pruning method for depthwise separable convolutions, achieving up to 3.74x speedup on EfficientNet-B0 with no accuracy loss!
Demystify Mamba in Vision: A Linear Attention Perspective
·2184 words·11 mins· loading · loading
Computer Vision Image Classification 🏒 Tsinghua University
Vision’s Mamba model demystified: Researchers unveil its surprising link to linear attention, improving efficiency and accuracy through design enhancements.
DeltaDEQ: Exploiting Heterogeneous Convergence for Accelerating Deep Equilibrium Iterations
·2888 words·14 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏒 ETH Zurich
DeltaDEQ accelerates deep equilibrium model inference by 73-84% via a novel ‘heterogeneous convergence’ exploitation technique, maintaining accuracy.
DEL: Discrete Element Learner for Learning 3D Particle Dynamics with Neural Rendering
·3655 words·18 mins· loading · loading
Computer Vision 3D Vision 🏒 Hong Kong University of Science and Technology
DEL: Learns 3D particle dynamics from 2D images via physics-informed neural rendering, exceeding existing methods’ accuracy and robustness.
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
·3247 words·16 mins· loading · loading
Computer Vision Image Generation 🏒 Michigan State University
AdvUnlearn enhances diffusion model robustness against adversarial attacks during concept erasure by integrating adversarial training, improving the trade-off between robustness and model utility.
Decoupling Semantic Similarity from Spatial Alignment for Neural Networks.
·2318 words·11 mins· loading · loading
Computer Vision Representation Learning 🏒 Google DeepMind
Researchers developed semantic RSMs, a novel approach to measure semantic similarity in neural networks, improving image retrieval and aligning network representations with predicted class probabiliti…
Decoupled Kullback-Leibler Divergence Loss
·2254 words·11 mins· loading · loading
Computer Vision Image Classification 🏒 the Chinese University of Hong Kong
Improved Kullback-Leibler (IKL) divergence loss achieves state-of-the-art adversarial robustness and competitive knowledge distillation performance by addressing KL loss’s limitations.
Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP
·2855 words·14 mins· loading · loading
Computer Vision Vision-Language Models 🏒 University of Maryland, College Park
This paper presents a general framework for interpreting Vision Transformer (ViT) components, mapping their contributions to CLIP space for textual interpretation, and introduces a scoring function fo…
DeBaRA: Denoising-Based 3D Room Arrangement Generation
·2721 words·13 mins· loading · loading
Computer Vision 3D Vision 🏒 Dassault Systèmes
DeBaRA: a novel denoising-based model generates realistic & controllable 3D room layouts, surpassing existing methods.
Dealing with Synthetic Data Contamination in Online Continual Learning
·2977 words·14 mins· loading · loading
Computer Vision Image Generation 🏒 University of Tokyo
AI-generated images contaminate online continual learning datasets, hindering performance. A new method, ESRM, leverages entropy and real/synthetic similarity maximization to select high-quality data…
DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor
·2459 words·12 mins· loading · loading
Computer Vision Image Generation 🏒 School of Computer Science and Technology, Tongji University, China
Deep Degradation Response (DDR) uses image deep feature changes under degradation to create a flexible image descriptor, excelling in blind image quality assessment and unsupervised image restoration.
DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain
·2252 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 Nanjing University of Science and Technology
DCDepth achieves state-of-the-art monocular depth estimation by progressively predicting depth in the frequency domain via DCT, capturing local correlations and global context effectively.
DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos
·2153 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 Virginia Tech
DC-Gaussian: A novel method generates high-fidelity novel views from dashcam videos by addressing common windshield obstructions (reflections, occlusions) using adaptive image decomposition, illumina…
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
·4461 words·21 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 Carnegie Mellon University
Unlearning synthesized images efficiently reveals influential training data for text-to-image models, improving data attribution accuracy and facilitating better model understanding.
DarkSAM: Fooling Segment Anything Model to Segment Nothing
·3441 words·17 mins· loading · loading
AI Generated Computer Vision Image Segmentation 🏒 Huazhong University of Science and Technology
DarkSAM, a novel prompt-free attack, renders the Segment Anything Model incapable of segmenting objects across diverse images, highlighting its vulnerability to universal adversarial perturbations.
DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection
·2460 words·12 mins· loading · loading
Computer Vision Object Detection 🏒 Intelligent Software Research Center, Institute of Software, CAS, Beijing, China
DA-Ada enhances domain adaptive object detection by using a novel domain-aware adapter that leverages both domain-invariant and domain-specific knowledge for improved accuracy and generalization acros…