Computer Vision

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos

26 September 2024·2627 words·13 mins· loading · loading

Computer Vision Scene Understanding 🏢 University of Arkansas

CYCLO: A novel cyclic graph transformer excels at multi-object relationship modeling in aerial videos.

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

26 September 2024·3396 words·16 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Tencent AI Lab

CV-VAE: A compatible video VAE enabling efficient, high-quality latent video generation by bridging the gap between image and video latent spaces.

Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise

26 September 2024·1703 words·8 mins· loading · loading

Computer Vision Image Classification 🏢 Gwangju Institute of Science and Technology

CUFIT: a novel curriculum fine-tuning paradigm significantly improves medical image classification accuracy despite noisy labels by leveraging pre-trained Vision Foundation Models.

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

26 September 2024·2880 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 UC Los Angeles

Ctrl-X: Zero-shot text-to-image generation with training-free structure & appearance control!

CryoSPIN: Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference

26 September 2024·1731 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Toronto

CryoSPIN revolutionizes ab-initio cryo-EM reconstruction with semi-amortized pose inference, achieving faster and more accurate 3D structure determination.

CryoGEM: Physics-Informed Generative Cryo-Electron Microscopy

26 September 2024·2131 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 ShanghaiTech University

CryoGEM: Physics-informed generative model creates realistic synthetic cryo-EM datasets, boosting particle picking and pose estimation accuracy for higher-resolution protein structure determination.

Cross-video Identity Correlating for Person Re-identification Pre-training

26 September 2024·1957 words·10 mins· loading · loading

Computer Vision Person Re-Identification 🏢 String

Cross-video Identity-cOrrelating pre-training (CION) revolutionizes person re-identification by leveraging identity correlation across videos for superior model pre-training, achieving state-of-the-ar…

Cross-Scale Self-Supervised Blind Image Deblurring via Implicit Neural Representation

26 September 2024·3186 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 National University of Singapore

Self-supervised blind image deblurring (BID) breakthrough! A novel cross-scale consistency loss and progressive training scheme using implicit neural representations achieves superior performance wit…

Cross-Modality Perturbation Synergy Attack for Person Re-identification

26 September 2024·1933 words·10 mins· loading · loading

Computer Vision Face Recognition 🏢 Xiamen University

Cross-Modality Perturbation Synergy (CMPS) attack: A novel universal perturbation method for cross-modality person re-identification, effectively misleading ReID models by leveraging gradients from di…

CRAYM: Neural Field Optimization via Camera RAY Matching

26 September 2024·2649 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Shenzhen University

CRAYM: Neural field optimization via camera RAY matching enhances 3D reconstruction by using camera rays, not pixels, improving both novel view synthesis and geometry.

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

26 September 2024·2236 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Tsinghua University

COVE: Consistent high-quality video editing achieved by leveraging diffusion feature correspondence for temporal consistency.

CountGD: Multi-Modal Open-World Counting

26 September 2024·2520 words·12 mins· loading · loading

Computer Vision Object Detection 🏢 University of Oxford

COUNTGD: A new multi-modal model counts objects in images using text or visual examples, significantly improving open-world counting accuracy.

CoSW: Conditional Sample Weighting for Smoke Segmentation with Label Noise

26 September 2024·2129 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 East China University of Science and Technology

CoSW: a novel conditional sample weighting method for robust smoke segmentation, achieves state-of-the-art results by handling inconsistent noisy labels through a multi-prototype framework.

COSMIC: Compress Satellite Image Efficiently via Diffusion Compensation

26 September 2024·3381 words·16 mins· loading · loading

Computer Vision Image Compression 🏢 Tsinghua University

COSMIC efficiently compresses satellite images via a lightweight encoder and diffusion compensation, enabling practical onboard processing and high compression ratios.

CosAE: Learnable Fourier Series for Image Restoration

26 September 2024·2867 words·14 mins· loading · loading

Computer Vision Image Restoration 🏢 NVIDIA Research

CosAE: a novel autoencoder using learnable Fourier series achieves state-of-the-art image restoration by encoding frequency coefficients in its narrow bottleneck, preserving fine details even with ext…

Cooperative Hardware-Prompt Learning for Snapshot Compressive Imaging

26 September 2024·1775 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 Rochester Institute of Technology

Federated Hardware-Prompt Learning (FedHP) enables robust cross-hardware SCI training by aligning inconsistent data distributions using a hardware-conditioned prompter, outperforming existing FL metho…

Contrastive-Equivariant Self-Supervised Learning Improves Alignment with Primate Visual Area IT

26 September 2024·2007 words·10 mins· loading · loading

Computer Vision Self-Supervised Learning 🏢 Center for Neural Science, New York University

Self-supervised learning models can now better predict primate IT neural responses by preserving structured variability to input transformations, improving alignment with biological visual perception.

Continuous Spatiotemporal Events Decoupling through Spike-based Bayesian Computation

26 September 2024·1859 words·9 mins· loading · loading

Computer Vision Image Segmentation 🏢 Peking University

Spiking neural network effectively segments mixed-motion event streams via spike-based Bayesian computation, achieving efficient real-time motion decoupling.

Continuous Heatmap Regression for Pose Estimation via Implicit Neural Representation

26 September 2024·2522 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Nanjing University of Science and Technology

NerPE: continuous heatmap regression via implicit neural representation resolves the accuracy-limiting quantization errors in human pose estimation, achieving sub-pixel precision.

ContextGS : Compact 3D Gaussian Splatting with Anchor Level Context Model

26 September 2024·1913 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Nanyang Technological University

ContextGS: Revolutionizing 3D scene compression with an anchor-level autoregressive model, achieving 15x size reduction in 3D Gaussian Splatting while boosting rendering quality.