Image Classification
Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation
·2875 words·14 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Dalian University of Technology
Wasserstein Distance-based Knowledge Distillation (WKD) rivals KL-divergence by leveraging rich category interrelations and handling non-overlapping distributions, significantly boosting performance i…
VMamba: Visual State Space Model
·2891 words·14 mins·
loading
·
loading
Image Classification
🏢 University of Chinese Academy of Sciences
VMamba: a vision backbone achieving linear time complexity using Visual State Space (VSS) blocks and 2D Selective Scan (SS2D) for efficient visual representation.
Visual Pinwheel Center Act as Geometric Saliency Detector
·2189 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Research Institute of Intelligent Complex Systems, Fudan University
Visual pinwheel centers in the cortex act as efficient geometric saliency detectors, responding faster and stronger to complex spatial textures than other structures.
Visual Fourier Prompt Tuning
·4269 words·21 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Rochester Institute of Technology
Visual Fourier Prompt Tuning (VFPT) leverages the Fast Fourier Transform to seamlessly integrate spatial and frequency information for superior parameter-efficient vision model fine-tuning, even with …
Visual Data Diagnosis and Debiasing with Concept Graphs
·2767 words·13 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Carnegie Mellon University
CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
·4915 words·24 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Singapore University of Technology and Design (SUTD)
OoD-ViT-NAS: a new benchmark reveals how ViT architecture impacts out-of-distribution generalization, highlighting the importance of embedding dimension and challenging the reliance on in-distribution…
Understanding Visual Feature Reliance through the Lens of Complexity
·3993 words·19 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Google DeepMind
Deep learning models favor simple features, hindering generalization; this paper introduces a new feature complexity metric revealing a spectrum of simple-to-complex features, their learning dynamics,…
Understanding Bias in Large-Scale Visual Datasets
·5190 words·25 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 University of Pennsylvania
Researchers unveil a novel framework to dissect bias in large-scale visual datasets, identifying unique visual attributes and leveraging language models for detailed analysis, paving the way for creat…
Transformer Doctor: Diagnosing and Treating Vision Transformers
·3080 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 College of Computer Science and Technology, Zhejiang University
Transformer Doctor diagnoses and treats vision transformer errors by identifying and correcting information integration issues, improving model performance and interpretability.
TrAct: Making First-layer Pre-Activations Trainable
·2254 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Stanford University
TrAct boosts vision model training by directly optimizing first-layer activations, leading to significant speedups (1.25x-4x) and improved accuracy.
To Err Like Human: Affective Bias-Inspired Measures for Visual Emotion Recognition Evaluation
·1759 words·9 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Nankai University
This paper introduces novel metrics for visual emotion recognition evaluation, considering the psychological distance between emotions to better reflect human perception, improving the assessment of m…
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
·2874 words·14 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 University of Washington
Using real images retrieved from a generator’s training data outperforms using synthetic images generated by that same model for image classification.
TARP-VP: Towards Evaluation of Transferred Adversarial Robustness and Privacy on Label Mapping Visual Prompting Models
·2161 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 University of Liverpool
TARP-VP reveals a surprising lack of trade-off between adversarial robustness and privacy for label mapping visual prompting models, showing that transferred adversarial training significantly improve…
Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning
·3674 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Google Research
Structured Unrestricted-Rank Matrices (SURMs) revolutionize parameter-efficient fine-tuning by offering greater flexibility and accuracy than existing methods like LoRA, achieving significant gains in…
Spiking Transformer with Experts Mixture
·2017 words·10 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Peking University
Spiking Experts Mixture Mechanism (SEMM) boosts Spiking Transformers by integrating Mixture-of-Experts for efficient, sparse conditional computation, achieving significant performance improvements on …
Slicing Vision Transformer for Flexibile Inference
·2922 words·14 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Snap Inc.
Scala: One-shot training enables flexible ViT inference!
Scaling White-Box Transformers for Vision
·2209 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 UC Santa Cruz
CRATE-a: A new white-box vision transformer architecture achieves 85.1% ImageNet accuracy by strategically scaling model size and datasets, outperforming prior white-box models and preserving interpre…
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
·3783 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Intel Labs
ScaleKD: Pre-trained vision transformers make excellent teachers for diverse student networks, improving efficiency and performance in knowledge distillation.
Samba: Severity-aware Recurrent Modeling for Cross-domain Medical Image Grading
·2230 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Westlake University
Samba: a novel severity-aware recurrent model, tackles cross-domain medical image grading by sequentially encoding image patches and recalibrating states using EM, significantly improving accuracy.
Saliency-driven Experience Replay for Continual Learning
·2628 words·13 mins·
loading
·
loading
Image Classification
🏢 University of Catania
Boosting AI’s continual learning via saliency-driven experience replay, achieving up to 20% accuracy improvement.