Skip to main content

Computer Vision

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation
·3081 words·15 mins· loading · loading
Computer Vision Image Segmentation 🏢 University of Twente
E2ENet: A novel 3D medical image segmentation model boasts high accuracy and efficiency by dynamically fusing multi-scale features and using restricted depth-shift 3D convolutions, significantly outp…
E-Motion: Future Motion Simulation via Event Sequence Diffusion
·4535 words·22 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Xidian University
E-Motion: Predicting future motion with unprecedented accuracy using event cameras and video diffusion models.
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
·3048 words·15 mins· loading · loading
Computer Vision Image Classification 🏢 National University of Singapore
Dynamic Tuning (DyT) significantly boosts Vision Transformer (ViT) adaptation by dynamically skipping less important tokens during inference, achieving superior performance with 71% fewer FLOPs than e…
Dual-frame Fluid Motion Estimation with Test-time Optimization and Zero-divergence Loss
·2477 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 University of Chinese Academy of Sciences
Self-supervised dual-frame fluid motion estimation achieves superior accuracy with 99% less training data, using a novel zero-divergence loss and dynamic velocimetry enhancement.
Dual-Diffusion for Binocular 3D Human Pose Estimation
·3829 words·18 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Shanghai Jiao Tong University
Dual-Diffusion boosts binocular 3D human pose estimation accuracy by simultaneously denoising 2D and 3D pose uncertainties using a diffusion model.
Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images
·3653 words·18 mins· loading · loading
Computer Vision 3D Vision 🏢 Bilkent University
Dual encoder GAN inversion achieves high-fidelity 3D head reconstruction from single images by cleverly combining outputs from encoders specialized for visible and invisible regions, surpassing existi…
DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting
·1974 words·10 mins· loading · loading
Computer Vision Image Segmentation 🏢 Zhejiang University
DRIP: A novel image matting method using pre-trained latent diffusion models achieves state-of-the-art performance by jointly predicting foreground and alpha values, significantly improving accuracy a…
DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models
·5101 words·24 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Australian National University
DreamSteerer enhances source image-conditioned editability in personalized diffusion models via a novel Editability Driven Score Distillation objective and mode shifting regularization, achieving sign…
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
·2231 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Carnegie Mellon University
DreamScene4D generates realistic 3D dynamic multi-object scenes from monocular videos via novel view synthesis, addressing limitations of existing methods with a novel decompose-recompose approach.
DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation
·2631 words·13 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Zhejiang University
DreamMesh4D: Generating high-fidelity dynamic 3D meshes from monocular video using a novel Gaussian-mesh hybrid representation and adaptive hybrid skinning.
DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM
·1954 words·10 mins· loading · loading
Computer Vision Image Generation 🏢 School of Information Science and Technology, ShanghaiTech University
DRACO, a denoising-reconstruction autoencoder, revolutionizes cryo-EM by leveraging a large-scale dataset and hybrid training for superior image denoising and downstream task performance.
Doubly Hierarchical Geometric Representations for Strand-based Human Hairstyle Generation
·2527 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 Carnegie Mellon University
Doubly hierarchical geometric representations enable realistic human hairstyle generation by separating low and high-frequency details in hair strands, resulting in high-quality, detailed virtual hair…
DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning
·1917 words·9 mins· loading · loading
Computer Vision Image Generation 🏢 Shanghai Jiao Tong University
DomainGallery: Few-shot domain-driven image generation via attribute-centric finetuning, solving key issues of previous works by introducing attribute erasure, disentanglement, regularization, and enh…
Domain Adaptation for Large-Vocabulary Object Detectors
·4715 words·23 mins· loading · loading
AI Generated Computer Vision Object Detection 🏢 State Key Laboratory of Integrated Services Networks, Xidian University
KGD: a novel knowledge graph distillation technique empowers large-vocabulary object detectors with superior cross-domain object classification, achieving state-of-the-art performance.
DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus
·3216 words·16 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 National University of Singapore
DOGS: Distributed-Oriented Gaussian Splatting accelerates large-scale 3D reconstruction by distributing the training of 3D Gaussian Splatting models across multiple machines, achieving 6x faster train…
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
·1983 words·10 mins· loading · loading
Computer Vision Action Recognition 🏢 Tongji University
Zero-shot online action detection gets a boost! OV-OAD leverages vision-language models and text supervision to achieve impressive performance on various benchmarks without relying on manual annotati…
DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering
·2765 words·13 mins· loading · loading
Computer Vision 3D Vision 🏢 University of Science and Technology of China
DN-4DGS: Real-time dynamic scene rendering is revolutionized by a denoised deformable network with temporal-spatial aggregation, achieving state-of-the-art quality.
DMesh: A Differentiable Mesh Representation
·3349 words·16 mins· loading · loading
Computer Vision 3D Vision 🏢 University of Maryland
DMesh: A novel differentiable mesh representation enabling efficient gradient-based optimization for diverse 3D shape applications.
DiTFastAttn: Attention Compression for Diffusion Transformer Models
·2788 words·14 mins· loading · loading
Computer Vision Image Generation 🏢 Tsinghua University
DiTFastAttn: A post-training compression method drastically speeds up diffusion transformer models by cleverly reducing redundancy in attention calculations, leading to up to a 1.8x speedup at high re…
Distribution-Aware Data Expansion with Diffusion Models
·3351 words·16 mins· loading · loading
AI Generated Computer Vision Image Classification 🏢 Tsinghua University
DistDiff, a training-free data expansion framework, leverages distribution-aware diffusion models to generate high-fidelity, diverse samples that enhance downstream model performance.