Computer Vision
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
·2348 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Multimedia Laboratory, the Chinese University of Hong Kong
ZOPP: A groundbreaking framework for zero-shot offboard panoptic perception in autonomous driving, enabling high-quality 3D scene understanding without human labeling.
ZeroMark: Towards Dataset Ownership Verification without Disclosing Watermark
·3327 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Face Recognition
🏢 University of Maryland College Park
ZeroMark revolutionizes dataset ownership verification by enabling copyright protection without exposing watermarks, leveraging the intrinsic properties of DNNs trained on watermarked data.
Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering
·3094 words·15 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Technion
Zero-to-Hero enhances zero-shot novel view synthesis by cleverly filtering attention maps during inference, achieving significantly higher fidelity and realism without retraining.
Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly
·2753 words·13 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Tsinghua University
Zero-shot 3D scene reconstruction from single images is achieved by assembling diverse deep priors from large models, eliminating the need for 3D/2D training data and achieving superior performance.
Zero-shot Image Editing with Reference Imitation
·2284 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 University of Hong Kong
MimicBrush: a novel image editing approach using reference imitation for intuitive zero-shot edits.
Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection
·2363 words·12 mins·
loading
·
loading
Computer Vision
Object Detection
🏢 Institute of Automation, Chinese Academy of Sciences (CAS)
ZiRa achieves zero-shot generalizable incremental learning for vision-language object detection by using a memory-efficient dual-branch architecture and zero-interference loss, significantly boosting …
Zero-Shot Event-Intensity Asymmetric Stereo via Visual Prompting from Image Domain
·4096 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Peking University
Zero-shot Event-Intensity Asymmetric Stereo (ZEST) uses visual prompting and monocular cues to achieve robust 3D perception without event-specific training, outperforming existing methods.
You Only Look Around: Learning Illumination-Invariant Feature for Low-light Object Detection
·2686 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Object Detection
🏢 Megvii Technology
YOLA: A novel framework for object detection in low-light conditions, achieving significant improvements by learning illumination-invariant features through a novel module.
YOLOv10: Real-Time End-to-End Object Detection
·1949 words·10 mins·
loading
·
loading
Computer Vision
Object Detection
🏢 Tsinghua University
YOLOv10: Real-time object detection achieves state-of-the-art speed and accuracy by eliminating NMS post-processing and holistically optimizing model architecture for efficiency and accuracy.
WildGaussians: 3D Gaussian Splatting In the Wild
·2601 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 ETH Zurich
WildGaussians enhances 3D Gaussian splatting for real-time rendering of photorealistic 3D scenes from in-the-wild images featuring occlusions and appearance changes.
Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections
·1766 words·9 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Johns Hopkins University
Wild-GS achieves real-time novel view synthesis from unconstrained photos by efficiently adapting 3D Gaussian Splatting, significantly improving speed and quality over existing methods.
Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval
·2035 words·10 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 NVIDIA Research
Unlocking personalized image retrieval and segmentation, a novel approach uses pre-trained text-to-image diffusion models to surpass supervised methods, addressing limitations of existing self-supervi…
When does perceptual alignment benefit vision representations?
·4058 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
Representation Learning
🏢 MIT
Aligning vision models to human perceptual similarity judgments significantly boosts performance in diverse vision tasks like counting and segmentation, but surprisingly reduces performance in natural…
What Variables Affect Out-of-Distribution Generalization in Pretrained Models?
·4187 words·20 mins·
loading
·
loading
Computer Vision
Representation Learning
🏢 Rochester Institute of Technology
High-resolution datasets with diverse classes significantly improve the transferability of pretrained DNNs by reducing representation compression and mitigating the ’tunnel effect.'
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
·2153 words·11 mins·
loading
·
loading
Computer Vision
Visual Question Answering
🏢 Google DeepMind
LLM-powered data curation boosts web-scale visual entity recognition!
Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation
·2875 words·14 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Dalian University of Technology
Wasserstein Distance-based Knowledge Distillation (WKD) rivals KL-divergence by leveraging rich category interrelations and handling non-overlapping distributions, significantly boosting performance i…
Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models
·2837 words·14 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 University of Texas at Austin
Warped Diffusion cleverly adapts image diffusion models for video inverse problems, solving flickering and temporal inconsistency issues by viewing video frames as continuous warping transformations a…
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
·2230 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), CASIA
VQ-Map leverages vector quantization to estimate bird’s-eye-view maps with unprecedented accuracy, setting new benchmarks.
Voxel Proposal Network via Multi-Frame Knowledge Distillation for Semantic Scene Completion
·2307 words·11 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Tianjin University
VPNet, a novel semantic scene completion network, uses multi-frame knowledge distillation and confident voxel proposals to improve accuracy and handle dynamic aspects of 3D scenes from point clouds, a…
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
·3011 words·15 mins·
loading
·
loading
Computer Vision
Visual Question Answering
🏢 UC San Diego
VLG-CBM enhances concept bottleneck models with vision-language guidance for faithful interpretability and improved accuracy.