Skip to main content

Computer Vision

Robust Fine-tuning of Zero-shot Models via Variance Reduction
·2809 words·14 mins· loading · loading
Computer Vision Vision-Language Models 🏒 Nanyang Technological University
Variance Reduction Fine-tuning (VRF) simultaneously boosts in-distribution and out-of-distribution accuracy in fine-tuned zero-shot models, overcoming the ID-OOD trade-off.
RobIR: Robust Inverse Rendering for High-Illumination Scenes
·2339 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 Tencent AI Lab
RobIR: Robust inverse rendering in high-illumination scenes using ACES tone mapping and regularized visibility estimation for accurate BRDF reconstruction.
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
·2551 words·12 mins· loading · loading
Computer Vision Image Generation 🏒 Wuhan University
ROBIN: A novel watermarking method for diffusion models that actively conceals robust watermarks using adversarial optimization, enabling strong, imperceptible, and verifiable image authentication.
RLE: A Unified Perspective of Data Augmentation for Cross-Spectral Re-Identification
·1804 words·9 mins· loading · loading
Computer Vision Face Recognition 🏒 Tencent AI Lab
RLE: A novel data augmentation strategy unifying cross-spectral re-ID, significantly boosting model performance by mimicking local linear transformations.
Revisiting the Integration of Convolution and Attention for Vision Backbone
·2197 words·11 mins· loading · loading
Computer Vision Image Classification 🏒 City University of Hong Kong
GLMix: A novel vision backbone efficiently integrates convolutions and multi-head self-attention at different granularities, achieving state-of-the-art performance while addressing scalability issues.
Revisiting motion information for RGB-Event tracking with MOT philosophy
·2713 words·13 mins· loading · loading
AI Generated Computer Vision Object Detection 🏒 Tsinghua University
RGB-Event tracker CSAM leverages MOT philosophy for enhanced robustness by integrating appearance and motion information from RGB and event streams, achieving state-of-the-art performance.
Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection
·1724 words·9 mins· loading · loading
Computer Vision Object Detection 🏒 Peking University
Researchers developed Camera-Agnostic Patch (CAP) attacks, improving adversarial patch reliability by simulating camera image processing in attacks against person detectors.
ReVideo: Remake a Video with Motion and Content Control
·2423 words·12 mins· loading · loading
Computer Vision Video Understanding 🏒 Peking University
ReVideo enables precise local video editing by independently controlling content and motion, overcoming limitations of existing methods and paving the way for advanced video manipulation.
RETR: Multi-View Radar Detection Transformer for Indoor Perception
·4299 words·21 mins· loading · loading
AI Generated Computer Vision Object Detection 🏒 Mitsubishi Electric Research Laboratories
RETR: Multi-view radar detection transformer significantly improves indoor object detection and segmentation.
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
·3245 words·16 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 Amazon Web Services Shanghai AI Lab
This paper presents a novel regional cross-attention module for rich-context layout-to-image generation, significantly improving image accuracy while addressing limitations of existing methods. Two n…
Rethinking Score Distillation as a Bridge Between Image Distributions
·2251 words·11 mins· loading · loading
Computer Vision Image Generation 🏒 UC Berkeley
Researchers enhanced image generation by improving score distillation sampling via a novel SchrΓΆdinger Bridge framework, improving realism without computational overhead.
Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution
·3193 words·15 mins· loading · loading
AI Generated Computer Vision Out-of-Distribution Detection 🏒 Alibaba Cloud
ImOOD tackles the challenge of imbalanced data distribution in OOD detection by introducing a generalized statistical framework and a unified regularization technique, leading to significant performan…
Rethinking No-reference Image Exposure Assessment from Holism to Pixel: Models, Datasets and Benchmarks
·2343 words·11 mins· loading · loading
Computer Vision Image Generation 🏒 Beijing University of Posts and Telecommunications
Revolutionizing image exposure assessment, Pixel-level IEA Network (P-IEANet) achieves state-of-the-art performance with a novel pixel-level approach, a new dataset (IEA40K), and a benchmark of 19 met…
Rethinking Imbalance in Image Super-Resolution for Efficient Inference
·2134 words·11 mins· loading · loading
Computer Vision Image Generation 🏒 Harbin Institute of Technology
WBSR: A novel framework for efficient image super-resolution that tackles data and model imbalances for superior performance and approximately a 34% reduction in computational cost.
Rethinking Decoders for Transformer-based Semantic Segmentation: Compression is All You Need
·2306 words·11 mins· loading · loading
Computer Vision Image Segmentation 🏒 Beijing University of Posts and Telecommunications
DEPICT: A new white-box decoder for Transformer-based semantic segmentation, achieving better performance with fewer parameters by leveraging the principle of compression and connecting Transformer de…
RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
·2570 words·13 mins· loading · loading
Computer Vision Image Generation 🏒 Hong Kong University of Science and Technology
RestoreAgent, an AI-powered image restoration agent, autonomously identifies and corrects multiple image degradations, exceeding human expert performance.
Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise
·2678 words·13 mins· loading · loading
Computer Vision Image Generation 🏒 College of Computer Science, Nankai University
Resfusion, a novel framework, accelerates image restoration by integrating residual noise into the diffusion process, achieving superior results with fewer steps.
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
·3815 words·18 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 Technical University of Munich
ReNO: Boost one-step text-to-image models by cleverly optimizing initial noise using reward signals, achieving state-of-the-art results efficiently.
Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising
·2025 words·10 mins· loading · loading
Computer Vision Image Generation 🏒 National University of Singapore
Remix-DiT: Boosting diffusion model image generation quality by cleverly mixing smaller basis models into numerous specialized denoisers, improving efficiency and lowering costs!
Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation
·3268 words·16 mins· loading · loading
Computer Vision Image Segmentation 🏒 School of Informatics, Xiamen University
Relationship Prompt Network (RPN) achieves state-of-the-art open-vocabulary semantic segmentation using only prompt learning and a Vision-Language Model (VLM), eliminating the need for expensive segme…