Computer Vision
Understanding and Improving Training-free Loss-based Diffusion Guidance
·2849 words·14 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 Microsoft Research
Training-free guidance revolutionizes diffusion models by enabling zero-shot conditional generation, but suffers from misaligned gradients and slow convergence. This paper provides theoretical analysi…
UMB: Understanding Model Behavior for Open-World Object Detection
·3512 words·17 mins·
loading
·
loading
AI Generated
Computer Vision
Object Detection
🏢 South China University of Technology
UMB: A novel model enhances open-world object detection by understanding model behavior, surpassing state-of-the-art with a 5.3 mAP gain for unknown classes.
UltraPixel: Advancing Ultra High-Resolution Image Synthesis to New Peaks
·3265 words·16 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
UltraPixel generates high-quality images at various resolutions (1K-6K) efficiently using cascade diffusion models, achieving state-of-the-art performance.
UDPM: Upsampling Diffusion Probabilistic Models
·3261 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 Tel Aviv University
UDPM: Upsampling Diffusion Probabilistic Models achieves high-quality image generation with fewer computations by incorporating downsampling and upsampling within the diffusion process.
UDON: Universal Dynamic Online distillatioN for generic image representations
·2160 words·11 mins·
loading
·
loading
Computer Vision
Image Representation Learning
🏢 Czech Technical University in Prague
UDON: a novel multi-teacher online distillation method creates highly efficient universal image embeddings by dynamically transferring domain-specific knowledge and adapting to imbalanced data.
U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers
·2151 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Peking University
U-DiT: Revolutionizing diffusion transformers with a U-Net design and token downsampling for superior image generation and drastically reduced computation cost.
Typicalness-Aware Learning for Failure Detection
·2037 words·10 mins·
loading
·
loading
Computer Vision
Failure Detection
🏢 Tencent Youtu Lab
Typicalness-Aware Learning (TAL) improves failure detection by dynamically adjusting prediction confidence based on sample typicality, mitigating overconfidence and achieving significant performance g…
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner
·2048 words·10 mins·
loading
·
loading
Computer Vision
Scene Understanding
🏢 String
Efficient Multi-Task Learning (EMTAL) transforms pre-trained Vision Transformers into efficient multi-task learners by using a MoEfied LoRA structure, a Quality Retaining optimization, and a router fa…
Transformer Doctor: Diagnosing and Treating Vision Transformers
·3080 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 College of Computer Science and Technology, Zhejiang University
Transformer Doctor diagnoses and treats vision transformer errors by identifying and correcting information integration issues, improving model performance and interpretability.
Transferable Adversarial Attacks on SAM and Its Downstream Models
·2130 words·10 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 Nanyang Technological University
UMI-GRAT: A universal meta-initialized and gradient robust adversarial attack effectively exploits vulnerabilities in the Segment Anything Model (SAM) and its fine-tuned downstream models, even withou…
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
·2325 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Shanghai Artificial Intelligence Laboratory
AdaptiveDiffusion accelerates diffusion model inference by adaptively skipping noise prediction steps, achieving 2-5x speedup without quality loss.
Training an Open-Vocabulary Monocular 3D Detection Model without 3D Data
·3285 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Tsinghua University
Train open-vocabulary 3D object detectors using only RGB images and large language models, achieving state-of-the-art performance without expensive LiDAR data.
TrAct: Making First-layer Pre-Activations Trainable
·2254 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Stanford University
TrAct boosts vision model training by directly optimizing first-layer activations, leading to significant speedups (1.25x-4x) and improved accuracy.
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
·2847 words·14 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Korea Advanced Institute of Science and Technology (KAIST)
Boosting diffusion-based human image animation, Test-time Procrustes Calibration (TPC) ensures high-quality outputs by aligning reference and target images, overcoming common compositional misalignmen…
Towards Unsupervised Model Selection for Domain Adaptive Object Detection
·1885 words·9 mins·
loading
·
loading
Computer Vision
Object Detection
🏢 University of Electronic Science and Technology of China
Unsupervised model selection for domain adaptive object detection is achieved via a new Detection Adaptation Score (DAS), effectively selecting optimal models without target labels by leveraging the f…
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
·2412 words·12 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 KAIST
PixelCLIP: Open-vocabulary semantic segmentation without pixel-level labels! Leveraging unlabeled image masks from Vision Foundation Models and an online clustering algorithm, PixelCLIP achieves imp…
Towards Multi-Domain Learning for Generalizable Video Anomaly Detection
·2936 words·14 mins·
loading
·
loading
Computer Vision
Video Understanding
🏢 Kyung Hee University
Researchers propose Multi-Domain learning for Video Anomaly Detection (MDVAD) to create generalizable models handling conflicting abnormality criteria across diverse datasets, improving accuracy and a…
Towards Learning Group-Equivariant Features for Domain Adaptive 3D Detection
·1931 words·10 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 University of Oxford
GroupEXP-DA boosts domain adaptive 3D object detection by using a grouping-exploration strategy to reduce bias in pseudo-label collection and account for multiple factors affecting object perception i…
Towards Global Optimal Visual In-Context Learning Prompt Selection
·2618 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Image Segmentation
🏢 Fudan University
Partial2Global: A novel VICL framework achieving globally optimal prompt selection, significantly improving visual in-context learning across various tasks.
Towards Flexible Visual Relationship Segmentation
·3217 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Image Segmentation
🏢 Microsoft Research
FleVRS: One unified model masters standard, promptable, and open-vocabulary visual relationship segmentation, outperforming existing methods.