Computer Vision

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving

26 September 2024·2348 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Multimedia Laboratory, the Chinese University of Hong Kong

ZOPP: A groundbreaking framework for zero-shot offboard panoptic perception in autonomous driving, enabling high-quality 3D scene understanding without human labeling.

ZeroMark: Towards Dataset Ownership Verification without Disclosing Watermark

26 September 2024·3327 words·16 mins· loading · loading

AI Generated Computer Vision Face Recognition 🏢 University of Maryland College Park

ZeroMark revolutionizes dataset ownership verification by enabling copyright protection without exposing watermarks, leveraging the intrinsic properties of DNNs trained on watermarked data.

Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

26 September 2024·3094 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Technion

Zero-to-Hero enhances zero-shot novel view synthesis by cleverly filtering attention maps during inference, achieving significantly higher fidelity and realism without retraining.

Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly

26 September 2024·2753 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

Zero-shot 3D scene reconstruction from single images is achieved by assembling diverse deep priors from large models, eliminating the need for 3D/2D training data and achieving superior performance.

Zero-shot Image Editing with Reference Imitation

26 September 2024·2284 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 University of Hong Kong

MimicBrush: a novel image editing approach using reference imitation for intuitive zero-shot edits.

Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection

26 September 2024·2363 words·12 mins· loading · loading

Computer Vision Object Detection 🏢 Institute of Automation, Chinese Academy of Sciences (CAS)

ZiRa achieves zero-shot generalizable incremental learning for vision-language object detection by using a memory-efficient dual-branch architecture and zero-interference loss, significantly boosting …

Zero-Shot Event-Intensity Asymmetric Stereo via Visual Prompting from Image Domain

26 September 2024·4096 words·20 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Peking University

Zero-shot Event-Intensity Asymmetric Stereo (ZEST) uses visual prompting and monocular cues to achieve robust 3D perception without event-specific training, outperforming existing methods.

You Only Look Around: Learning Illumination-Invariant Feature for Low-light Object Detection

26 September 2024·2686 words·13 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Megvii Technology

YOLA: A novel framework for object detection in low-light conditions, achieving significant improvements by learning illumination-invariant features through a novel module.

YOLOv10: Real-Time End-to-End Object Detection

26 September 2024·1949 words·10 mins· loading · loading

Computer Vision Object Detection 🏢 Tsinghua University

YOLOv10: Real-time object detection achieves state-of-the-art speed and accuracy by eliminating NMS post-processing and holistically optimizing model architecture for efficiency and accuracy.

WildGaussians: 3D Gaussian Splatting In the Wild

26 September 2024·2601 words·13 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 ETH Zurich

WildGaussians enhances 3D Gaussian splatting for real-time rendering of photorealistic 3D scenes from in-the-wild images featuring occlusions and appearance changes.

Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

26 September 2024·1766 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Johns Hopkins University

Wild-GS achieves real-time novel view synthesis from unconstrained photos by efficiently adapting 3D Gaussian Splatting, significantly improving speed and quality over existing methods.

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval

26 September 2024·2035 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 NVIDIA Research

Unlocking personalized image retrieval and segmentation, a novel approach uses pre-trained text-to-image diffusion models to surpass supervised methods, addressing limitations of existing self-supervi…

When does perceptual alignment benefit vision representations?

26 September 2024·4058 words·20 mins· loading · loading

AI Generated Computer Vision Representation Learning 🏢 MIT

Aligning vision models to human perceptual similarity judgments significantly boosts performance in diverse vision tasks like counting and segmentation, but surprisingly reduces performance in natural…

What Variables Affect Out-of-Distribution Generalization in Pretrained Models?

26 September 2024·4187 words·20 mins· loading · loading

Computer Vision Representation Learning 🏢 Rochester Institute of Technology

High-resolution datasets with diverse classes significantly improve the transferability of pretrained DNNs by reducing representation compression and mitigating the ’tunnel effect.'

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

26 September 2024·2153 words·11 mins· loading · loading

Computer Vision Visual Question Answering 🏢 Google DeepMind

LLM-powered data curation boosts web-scale visual entity recognition!

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

26 September 2024·2875 words·14 mins· loading · loading

Computer Vision Image Classification 🏢 Dalian University of Technology

Wasserstein Distance-based Knowledge Distillation (WKD) rivals KL-divergence by leveraging rich category interrelations and handling non-overlapping distributions, significantly boosting performance i…

Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models

26 September 2024·2837 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 University of Texas at Austin

Warped Diffusion cleverly adapts image diffusion models for video inverse problems, solving flickering and temporal inconsistency issues by viewing video frames as continuous warping transformations a…

VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization

26 September 2024·2230 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), CASIA

VQ-Map leverages vector quantization to estimate bird’s-eye-view maps with unprecedented accuracy, setting new benchmarks.

Voxel Proposal Network via Multi-Frame Knowledge Distillation for Semantic Scene Completion

26 September 2024·2307 words·11 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tianjin University

VPNet, a novel semantic scene completion network, uses multi-frame knowledge distillation and confident voxel proposals to improve accuracy and handle dynamic aspects of 3D scenes from point clouds, a…

VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance

26 September 2024·3011 words·15 mins· loading · loading

Computer Vision Visual Question Answering 🏢 UC San Diego

VLG-CBM enhances concept bottleneck models with vision-language guidance for faithful interpretability and improved accuracy.