Computer Vision

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

26 September 2024·3081 words·15 mins· loading · loading

Computer Vision Image Segmentation 🏢 University of Twente

E2ENet: A novel 3D medical image segmentation model boasts high accuracy and efficiency by dynamically fusing multi-scale features and using restricted depth-shift 3D convolutions, significantly outp…

E-Motion: Future Motion Simulation via Event Sequence Diffusion

26 September 2024·4535 words·22 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Xidian University

E-Motion: Predicting future motion with unprecedented accuracy using event cameras and video diffusion models.

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

26 September 2024·3048 words·15 mins· loading · loading

Computer Vision Image Classification 🏢 National University of Singapore

Dynamic Tuning (DyT) significantly boosts Vision Transformer (ViT) adaptation by dynamically skipping less important tokens during inference, achieving superior performance with 71% fewer FLOPs than e…

Dual-frame Fluid Motion Estimation with Test-time Optimization and Zero-divergence Loss

26 September 2024·2477 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Chinese Academy of Sciences

Self-supervised dual-frame fluid motion estimation achieves superior accuracy with 99% less training data, using a novel zero-divergence loss and dynamic velocimetry enhancement.

Dual-Diffusion for Binocular 3D Human Pose Estimation

26 September 2024·3829 words·18 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Shanghai Jiao Tong University

Dual-Diffusion boosts binocular 3D human pose estimation accuracy by simultaneously denoising 2D and 3D pose uncertainties using a diffusion model.

Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

26 September 2024·3653 words·18 mins· loading · loading

Computer Vision 3D Vision 🏢 Bilkent University

Dual encoder GAN inversion achieves high-fidelity 3D head reconstruction from single images by cleverly combining outputs from encoders specialized for visible and invisible regions, surpassing existi…

DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting

26 September 2024·1974 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 Zhejiang University

DRIP: A novel image matting method using pre-trained latent diffusion models achieves state-of-the-art performance by jointly predicting foreground and alpha values, significantly improving accuracy a…

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

26 September 2024·5101 words·24 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Australian National University

DreamSteerer enhances source image-conditioned editability in personalized diffusion models via a novel Editability Driven Score Distillation objective and mode shifting regularization, achieving sign…

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

26 September 2024·2231 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Carnegie Mellon University

DreamScene4D generates realistic 3D dynamic multi-object scenes from monocular videos via novel view synthesis, addressing limitations of existing methods with a novel decompose-recompose approach.

DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

26 September 2024·2631 words·13 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Zhejiang University

DreamMesh4D: Generating high-fidelity dynamic 3D meshes from monocular video using a novel Gaussian-mesh hybrid representation and adaptive hybrid skinning.

DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM

26 September 2024·1954 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 School of Information Science and Technology, ShanghaiTech University

DRACO, a denoising-reconstruction autoencoder, revolutionizes cryo-EM by leveraging a large-scale dataset and hybrid training for superior image denoising and downstream task performance.

Doubly Hierarchical Geometric Representations for Strand-based Human Hairstyle Generation

26 September 2024·2527 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Carnegie Mellon University

Doubly hierarchical geometric representations enable realistic human hairstyle generation by separating low and high-frequency details in hair strands, resulting in high-quality, detailed virtual hair…

DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning

26 September 2024·1917 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

DomainGallery: Few-shot domain-driven image generation via attribute-centric finetuning, solving key issues of previous works by introducing attribute erasure, disentanglement, regularization, and enh…

Domain Adaptation for Large-Vocabulary Object Detectors

26 September 2024·4715 words·23 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 State Key Laboratory of Integrated Services Networks, Xidian University

KGD: a novel knowledge graph distillation technique empowers large-vocabulary object detectors with superior cross-domain object classification, achieving state-of-the-art performance.

DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus

26 September 2024·3216 words·16 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 National University of Singapore

DOGS: Distributed-Oriented Gaussian Splatting accelerates large-scale 3D reconstruction by distributing the training of 3D Gaussian Splatting models across multiple machines, achieving 6x faster train…

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

26 September 2024·1983 words·10 mins· loading · loading

Computer Vision Action Recognition 🏢 Tongji University

Zero-shot online action detection gets a boost! OV-OAD leverages vision-language models and text supervision to achieve impressive performance on various benchmarks without relying on manual annotati…

DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering

26 September 2024·2765 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Science and Technology of China

DN-4DGS: Real-time dynamic scene rendering is revolutionized by a denoised deformable network with temporal-spatial aggregation, achieving state-of-the-art quality.

DMesh: A Differentiable Mesh Representation

26 September 2024·3349 words·16 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Maryland

DMesh: A novel differentiable mesh representation enabling efficient gradient-based optimization for diverse 3D shape applications.

DiTFastAttn: Attention Compression for Diffusion Transformer Models

26 September 2024·2788 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

DiTFastAttn: A post-training compression method drastically speeds up diffusion transformer models by cleverly reducing redundancy in attention calculations, leading to up to a 1.8x speedup at high re…

Distribution-Aware Data Expansion with Diffusion Models

26 September 2024·3351 words·16 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Tsinghua University

DistDiff, a training-free data expansion framework, leverages distribution-aware diffusion models to generate high-fidelity, diverse samples that enhance downstream model performance.