Computer Vision

Robust Fine-tuning of Zero-shot Models via Variance Reduction

26 September 2024·2809 words·14 mins· loading · loading

Computer Vision Vision-Language Models 🏢 Nanyang Technological University

Variance Reduction Fine-tuning (VRF) simultaneously boosts in-distribution and out-of-distribution accuracy in fine-tuned zero-shot models, overcoming the ID-OOD trade-off.

RobIR: Robust Inverse Rendering for High-Illumination Scenes

26 September 2024·2339 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Tencent AI Lab

RobIR: Robust inverse rendering in high-illumination scenes using ACES tone mapping and regularized visibility estimation for accurate BRDF reconstruction.

ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization

26 September 2024·2551 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Wuhan University

ROBIN: A novel watermarking method for diffusion models that actively conceals robust watermarks using adversarial optimization, enabling strong, imperceptible, and verifiable image authentication.

RLE: A Unified Perspective of Data Augmentation for Cross-Spectral Re-Identification

26 September 2024·1804 words·9 mins· loading · loading

Computer Vision Face Recognition 🏢 Tencent AI Lab

RLE: A novel data augmentation strategy unifying cross-spectral re-ID, significantly boosting model performance by mimicking local linear transformations.

Revisiting the Integration of Convolution and Attention for Vision Backbone

26 September 2024·2197 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 City University of Hong Kong

GLMix: A novel vision backbone efficiently integrates convolutions and multi-head self-attention at different granularities, achieving state-of-the-art performance while addressing scalability issues.

Revisiting motion information for RGB-Event tracking with MOT philosophy

26 September 2024·2713 words·13 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Tsinghua University

RGB-Event tracker CSAM leverages MOT philosophy for enhanced robustness by integrating appearance and motion information from RGB and event streams, achieving state-of-the-art performance.

Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection

26 September 2024·1724 words·9 mins· loading · loading

Computer Vision Object Detection 🏢 Peking University

Researchers developed Camera-Agnostic Patch (CAP) attacks, improving adversarial patch reliability by simulating camera image processing in attacks against person detectors.

ReVideo: Remake a Video with Motion and Content Control

26 September 2024·2423 words·12 mins· loading · loading

Computer Vision Video Understanding 🏢 Peking University

ReVideo enables precise local video editing by independently controlling content and motion, overcoming limitations of existing methods and paving the way for advanced video manipulation.

RETR: Multi-View Radar Detection Transformer for Indoor Perception

26 September 2024·4299 words·21 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Mitsubishi Electric Research Laboratories

RETR: Multi-view radar detection transformer significantly improves indoor object detection and segmentation.

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

26 September 2024·3245 words·16 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Amazon Web Services Shanghai AI Lab

This paper presents a novel regional cross-attention module for rich-context layout-to-image generation, significantly improving image accuracy while addressing limitations of existing methods. Two n…

Rethinking Score Distillation as a Bridge Between Image Distributions

26 September 2024·2251 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 UC Berkeley

Researchers enhanced image generation by improving score distillation sampling via a novel Schrödinger Bridge framework, improving realism without computational overhead.

Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

26 September 2024·3193 words·15 mins· loading · loading

AI Generated Computer Vision Out-of-Distribution Detection 🏢 Alibaba Cloud

ImOOD tackles the challenge of imbalanced data distribution in OOD detection by introducing a generalized statistical framework and a unified regularization technique, leading to significant performan…

Rethinking No-reference Image Exposure Assessment from Holism to Pixel: Models, Datasets and Benchmarks

26 September 2024·2343 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Beijing University of Posts and Telecommunications

Revolutionizing image exposure assessment, Pixel-level IEA Network (P-IEANet) achieves state-of-the-art performance with a novel pixel-level approach, a new dataset (IEA40K), and a benchmark of 19 met…

Rethinking Imbalance in Image Super-Resolution for Efficient Inference

26 September 2024·2134 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Harbin Institute of Technology

WBSR: A novel framework for efficient image super-resolution that tackles data and model imbalances for superior performance and approximately a 34% reduction in computational cost.

Rethinking Decoders for Transformer-based Semantic Segmentation: Compression is All You Need

26 September 2024·2306 words·11 mins· loading · loading

Computer Vision Image Segmentation 🏢 Beijing University of Posts and Telecommunications

DEPICT: A new white-box decoder for Transformer-based semantic segmentation, achieving better performance with fewer parameters by leveraging the principle of compression and connecting Transformer de…

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

26 September 2024·2570 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

RestoreAgent, an AI-powered image restoration agent, autonomously identifies and corrects multiple image degradations, exceeding human expert performance.

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

26 September 2024·2678 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 College of Computer Science, Nankai University

Resfusion, a novel framework, accelerates image restoration by integrating residual noise into the diffusion process, achieving superior results with fewer steps.

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

26 September 2024·3815 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Technical University of Munich

ReNO: Boost one-step text-to-image models by cleverly optimizing initial noise using reward signals, achieving state-of-the-art results efficiently.

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

26 September 2024·2025 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 National University of Singapore

Remix-DiT: Boosting diffusion model image generation quality by cleverly mixing smaller basis models into numerous specialized denoisers, improving efficiency and lowering costs!

Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation

26 September 2024·3268 words·16 mins· loading · loading

Computer Vision Image Segmentation 🏢 School of Informatics, Xiamen University

Relationship Prompt Network (RPN) achieves state-of-the-art open-vocabulary semantic segmentation using only prompt learning and a Vision-Language Model (VLM), eliminating the need for expensive segme…