Computer Vision

Understanding and Improving Training-free Loss-based Diffusion Guidance

26 September 2024·2849 words·14 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Microsoft Research

Training-free guidance revolutionizes diffusion models by enabling zero-shot conditional generation, but suffers from misaligned gradients and slow convergence. This paper provides theoretical analysi…

UMB: Understanding Model Behavior for Open-World Object Detection

26 September 2024·3512 words·17 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 South China University of Technology

UMB: A novel model enhances open-world object detection by understanding model behavior, surpassing state-of-the-art with a 5.3 mAP gain for unknown classes.

UltraPixel: Advancing Ultra High-Resolution Image Synthesis to New Peaks

26 September 2024·3265 words·16 mins· loading · loading

Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

UltraPixel generates high-quality images at various resolutions (1K-6K) efficiently using cascade diffusion models, achieving state-of-the-art performance.

UDPM: Upsampling Diffusion Probabilistic Models

26 September 2024·3261 words·16 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Tel Aviv University

UDPM: Upsampling Diffusion Probabilistic Models achieves high-quality image generation with fewer computations by incorporating downsampling and upsampling within the diffusion process.

UDON: Universal Dynamic Online distillatioN for generic image representations

26 September 2024·2160 words·11 mins· loading · loading

Computer Vision Image Representation Learning 🏢 Czech Technical University in Prague

UDON: a novel multi-teacher online distillation method creates highly efficient universal image embeddings by dynamically transferring domain-specific knowledge and adapting to imbalanced data.

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

26 September 2024·2151 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Peking University

U-DiT: Revolutionizing diffusion transformers with a U-Net design and token downsampling for superior image generation and drastically reduced computation cost.

Typicalness-Aware Learning for Failure Detection

26 September 2024·2037 words·10 mins· loading · loading

Computer Vision Failure Detection 🏢 Tencent Youtu Lab

Typicalness-Aware Learning (TAL) improves failure detection by dynamically adjusting prediction confidence based on sample typicality, mitigating overconfidence and achieving significant performance g…

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner

26 September 2024·2048 words·10 mins· loading · loading

Computer Vision Scene Understanding 🏢 String

Efficient Multi-Task Learning (EMTAL) transforms pre-trained Vision Transformers into efficient multi-task learners by using a MoEfied LoRA structure, a Quality Retaining optimization, and a router fa…

Transformer Doctor: Diagnosing and Treating Vision Transformers

26 September 2024·3080 words·15 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 College of Computer Science and Technology, Zhejiang University

Transformer Doctor diagnoses and treats vision transformer errors by identifying and correcting information integration issues, improving model performance and interpretability.

Transferable Adversarial Attacks on SAM and Its Downstream Models

26 September 2024·2130 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 Nanyang Technological University

UMI-GRAT: A universal meta-initialized and gradient robust adversarial attack effectively exploits vulnerabilities in the Segment Anything Model (SAM) and its fine-tuned downstream models, even withou…

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

26 September 2024·2325 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Shanghai Artificial Intelligence Laboratory

AdaptiveDiffusion accelerates diffusion model inference by adaptively skipping noise prediction steps, achieving 2-5x speedup without quality loss.

Training an Open-Vocabulary Monocular 3D Detection Model without 3D Data

26 September 2024·3285 words·16 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tsinghua University

Train open-vocabulary 3D object detectors using only RGB images and large language models, achieving state-of-the-art performance without expensive LiDAR data.

TrAct: Making First-layer Pre-Activations Trainable

26 September 2024·2254 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Stanford University

TrAct boosts vision model training by directly optimizing first-layer activations, leading to significant speedups (1.25x-4x) and improved accuracy.

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation

26 September 2024·2847 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 Korea Advanced Institute of Science and Technology (KAIST)

Boosting diffusion-based human image animation, Test-time Procrustes Calibration (TPC) ensures high-quality outputs by aligning reference and target images, overcoming common compositional misalignmen…

Towards Unsupervised Model Selection for Domain Adaptive Object Detection

26 September 2024·1885 words·9 mins· loading · loading

Computer Vision Object Detection 🏢 University of Electronic Science and Technology of China

Unsupervised model selection for domain adaptive object detection is achieved via a new Detection Adaptation Score (DAS), effectively selecting optimal models without target labels by leveraging the f…

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels

26 September 2024·2412 words·12 mins· loading · loading

Computer Vision Image Segmentation 🏢 KAIST

PixelCLIP: Open-vocabulary semantic segmentation without pixel-level labels! Leveraging unlabeled image masks from Vision Foundation Models and an online clustering algorithm, PixelCLIP achieves imp…

Towards Multi-Domain Learning for Generalizable Video Anomaly Detection

26 September 2024·2936 words·14 mins· loading · loading

Computer Vision Video Understanding 🏢 Kyung Hee University

Researchers propose Multi-Domain learning for Video Anomaly Detection (MDVAD) to create generalizable models handling conflicting abnormality criteria across diverse datasets, improving accuracy and a…

Towards Learning Group-Equivariant Features for Domain Adaptive 3D Detection

26 September 2024·1931 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Oxford

GroupEXP-DA boosts domain adaptive 3D object detection by using a grouping-exploration strategy to reduce bias in pseudo-label collection and account for multiple factors affecting object perception i…

Towards Global Optimal Visual In-Context Learning Prompt Selection

26 September 2024·2618 words·13 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 Fudan University

Partial2Global: A novel VICL framework achieving globally optimal prompt selection, significantly improving visual in-context learning across various tasks.

Towards Flexible Visual Relationship Segmentation

26 September 2024·3217 words·16 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 Microsoft Research

FleVRS: One unified model masters standard, promptable, and open-vocabulary visual relationship segmentation, outperforming existing methods.