Computer Vision

DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators

26 September 2024·2183 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Nokia Bell Labs

DEX boosts CNN accuracy on tiny AI accelerators by 3.5%p, utilizing unused memory and processors to extend input channels without increasing latency.

DeTrack: In-model Latent Denoising Learning for Visual Object Tracking

26 September 2024·2169 words·11 mins· loading · loading

Computer Vision Object Detection 🏢 School of Computer Science, Fudan University

DeTrack revolutionizes visual object tracking with an in-model latent denoising learning process, achieving real-time speed and state-of-the-art accuracy.

Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

26 September 2024·2664 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Stanford University

Depth Anywhere enhances 360-degree monocular depth estimation by cleverly using perspective models to label unlabeled 360-degree data, significantly improving accuracy.

Depth Anything V2

26 September 2024·3310 words·16 mins· loading · loading

Computer Vision 3D Vision 🏢 TikTok

Depth Anything V2 drastically improves monocular depth estimation by using synthetic training data, scaling up the teacher model, and employing pseudo-labeled real images. It outperforms previous met…

DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism

26 September 2024·2917 words·14 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Samsung Electronics

DEPrune: A novel GPU-optimized pruning method for depthwise separable convolutions, achieving up to 3.74x speedup on EfficientNet-B0 with no accuracy loss!

Demystify Mamba in Vision: A Linear Attention Perspective

26 September 2024·2184 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Tsinghua University

Vision’s Mamba model demystified: Researchers unveil its surprising link to linear attention, improving efficiency and accuracy through design enhancements.

DeltaDEQ: Exploiting Heterogeneous Convergence for Accelerating Deep Equilibrium Iterations

26 September 2024·2888 words·14 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 ETH Zurich

DeltaDEQ accelerates deep equilibrium model inference by 73-84% via a novel ‘heterogeneous convergence’ exploitation technique, maintaining accuracy.

DEL: Discrete Element Learner for Learning 3D Particle Dynamics with Neural Rendering

26 September 2024·3655 words·18 mins· loading · loading

Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

DEL: Learns 3D particle dynamics from 2D images via physics-informed neural rendering, exceeding existing methods’ accuracy and robustness.

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

26 September 2024·3247 words·16 mins· loading · loading

Computer Vision Image Generation 🏢 Michigan State University

AdvUnlearn enhances diffusion model robustness against adversarial attacks during concept erasure by integrating adversarial training, improving the trade-off between robustness and model utility.

Decoupling Semantic Similarity from Spatial Alignment for Neural Networks.

26 September 2024·2318 words·11 mins· loading · loading

Computer Vision Representation Learning 🏢 Google DeepMind

Researchers developed semantic RSMs, a novel approach to measure semantic similarity in neural networks, improving image retrieval and aligning network representations with predicted class probabiliti…

Decoupled Kullback-Leibler Divergence Loss

26 September 2024·2254 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 the Chinese University of Hong Kong

Improved Kullback-Leibler (IKL) divergence loss achieves state-of-the-art adversarial robustness and competitive knowledge distillation performance by addressing KL loss’s limitations.

Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP

26 September 2024·2855 words·14 mins· loading · loading

Computer Vision Vision-Language Models 🏢 University of Maryland, College Park

This paper presents a general framework for interpreting Vision Transformer (ViT) components, mapping their contributions to CLIP space for textual interpretation, and introduces a scoring function fo…

DeBaRA: Denoising-Based 3D Room Arrangement Generation

26 September 2024·2721 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Dassault Systèmes

DeBaRA: a novel denoising-based model generates realistic & controllable 3D room layouts, surpassing existing methods.

Dealing with Synthetic Data Contamination in Online Continual Learning

26 September 2024·2977 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 University of Tokyo

AI-generated images contaminate online continual learning datasets, hindering performance. A new method, ESRM, leverages entropy and real/synthetic similarity maximization to select high-quality data…

DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

26 September 2024·2459 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 School of Computer Science and Technology, Tongji University, China

Deep Degradation Response (DDR) uses image deep feature changes under degradation to create a flexible image descriptor, excelling in blind image quality assessment and unsupervised image restoration.

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

26 September 2024·2252 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Nanjing University of Science and Technology

DCDepth achieves state-of-the-art monocular depth estimation by progressively predicting depth in the frequency domain via DCT, capturing local correlations and global context effectively.

DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos

26 September 2024·2153 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Virginia Tech

DC-Gaussian: A novel method generates high-fidelity novel views from dashcam videos by addressing common windshield obstructions (reflections, occlusions) using adaptive image decomposition, illumina…

Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

26 September 2024·4461 words·21 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Carnegie Mellon University

Unlearning synthesized images efficiently reveals influential training data for text-to-image models, improving data attribution accuracy and facilitating better model understanding.

DarkSAM: Fooling Segment Anything Model to Segment Nothing

26 September 2024·3441 words·17 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 Huazhong University of Science and Technology

DarkSAM, a novel prompt-free attack, renders the Segment Anything Model incapable of segmenting objects across diverse images, highlighting its vulnerability to universal adversarial perturbations.

DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection

26 September 2024·2460 words·12 mins· loading · loading

Computer Vision Object Detection 🏢 Intelligent Software Research Center, Institute of Software, CAS, Beijing, China

DA-Ada enhances domain adaptive object detection by using a novel domain-aware adapter that leverages both domain-invariant and domain-specific knowledge for improved accuracy and generalization acros…