Computer Vision
Robust Fine-tuning of Zero-shot Models via Variance Reduction
·2809 words·14 mins·
loading
·
loading
Computer Vision
Vision-Language Models
π’ Nanyang Technological University
Variance Reduction Fine-tuning (VRF) simultaneously boosts in-distribution and out-of-distribution accuracy in fine-tuned zero-shot models, overcoming the ID-OOD trade-off.
RobIR: Robust Inverse Rendering for High-Illumination Scenes
·2339 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Tencent AI Lab
RobIR: Robust inverse rendering in high-illumination scenes using ACES tone mapping and regularized visibility estimation for accurate BRDF reconstruction.
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
·2551 words·12 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Wuhan University
ROBIN: A novel watermarking method for diffusion models that actively conceals robust watermarks using adversarial optimization, enabling strong, imperceptible, and verifiable image authentication.
RLE: A Unified Perspective of Data Augmentation for Cross-Spectral Re-Identification
·1804 words·9 mins·
loading
·
loading
Computer Vision
Face Recognition
π’ Tencent AI Lab
RLE: A novel data augmentation strategy unifying cross-spectral re-ID, significantly boosting model performance by mimicking local linear transformations.
Revisiting the Integration of Convolution and Attention for Vision Backbone
·2197 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
π’ City University of Hong Kong
GLMix: A novel vision backbone efficiently integrates convolutions and multi-head self-attention at different granularities, achieving state-of-the-art performance while addressing scalability issues.
Revisiting motion information for RGB-Event tracking with MOT philosophy
·2713 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Object Detection
π’ Tsinghua University
RGB-Event tracker CSAM leverages MOT philosophy for enhanced robustness by integrating appearance and motion information from RGB and event streams, achieving state-of-the-art performance.
Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection
·1724 words·9 mins·
loading
·
loading
Computer Vision
Object Detection
π’ Peking University
Researchers developed Camera-Agnostic Patch (CAP) attacks, improving adversarial patch reliability by simulating camera image processing in attacks against person detectors.
ReVideo: Remake a Video with Motion and Content Control
·2423 words·12 mins·
loading
·
loading
Computer Vision
Video Understanding
π’ Peking University
ReVideo enables precise local video editing by independently controlling content and motion, overcoming limitations of existing methods and paving the way for advanced video manipulation.
RETR: Multi-View Radar Detection Transformer for Indoor Perception
·4299 words·21 mins·
loading
·
loading
AI Generated
Computer Vision
Object Detection
π’ Mitsubishi Electric Research Laboratories
RETR: Multi-view radar detection transformer significantly improves indoor object detection and segmentation.
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
·3245 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ Amazon Web Services Shanghai AI Lab
This paper presents a novel regional cross-attention module for rich-context layout-to-image generation, significantly improving image accuracy while addressing limitations of existing methods. Two n…
Rethinking Score Distillation as a Bridge Between Image Distributions
·2251 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
π’ UC Berkeley
Researchers enhanced image generation by improving score distillation sampling via a novel SchrΓΆdinger Bridge framework, improving realism without computational overhead.
Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution
·3193 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Out-of-Distribution Detection
π’ Alibaba Cloud
ImOOD tackles the challenge of imbalanced data distribution in OOD detection by introducing a generalized statistical framework and a unified regularization technique, leading to significant performan…
Rethinking No-reference Image Exposure Assessment from Holism to Pixel: Models, Datasets and Benchmarks
·2343 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Beijing University of Posts and Telecommunications
Revolutionizing image exposure assessment, Pixel-level IEA Network (P-IEANet) achieves state-of-the-art performance with a novel pixel-level approach, a new dataset (IEA40K), and a benchmark of 19 met…
Rethinking Imbalance in Image Super-Resolution for Efficient Inference
·2134 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Harbin Institute of Technology
WBSR: A novel framework for efficient image super-resolution that tackles data and model imbalances for superior performance and approximately a 34% reduction in computational cost.
Rethinking Decoders for Transformer-based Semantic Segmentation: Compression is All You Need
·2306 words·11 mins·
loading
·
loading
Computer Vision
Image Segmentation
π’ Beijing University of Posts and Telecommunications
DEPICT: A new white-box decoder for Transformer-based semantic segmentation, achieving better performance with fewer parameters by leveraging the principle of compression and connecting Transformer de…
RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
·2570 words·13 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Hong Kong University of Science and Technology
RestoreAgent, an AI-powered image restoration agent, autonomously identifies and corrects multiple image degradations, exceeding human expert performance.
Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise
·2678 words·13 mins·
loading
·
loading
Computer Vision
Image Generation
π’ College of Computer Science, Nankai University
Resfusion, a novel framework, accelerates image restoration by integrating residual noise into the diffusion process, achieving superior results with fewer steps.
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
·3815 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ Technical University of Munich
ReNO: Boost one-step text-to-image models by cleverly optimizing initial noise using reward signals, achieving state-of-the-art results efficiently.
Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising
·2025 words·10 mins·
loading
·
loading
Computer Vision
Image Generation
π’ National University of Singapore
Remix-DiT: Boosting diffusion model image generation quality by cleverly mixing smaller basis models into numerous specialized denoisers, improving efficiency and lowering costs!
Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation
·3268 words·16 mins·
loading
·
loading
Computer Vision
Image Segmentation
π’ School of Informatics, Xiamen University
Relationship Prompt Network (RPN) achieves state-of-the-art open-vocabulary semantic segmentation using only prompt learning and a Vision-Language Model (VLM), eliminating the need for expensive segme…