Image Classification

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

26 September 2024·2875 words·14 mins· loading · loading

Computer Vision Image Classification 🏢 Dalian University of Technology

Wasserstein Distance-based Knowledge Distillation (WKD) rivals KL-divergence by leveraging rich category interrelations and handling non-overlapping distributions, significantly boosting performance i…

VMamba: Visual State Space Model

26 September 2024·2891 words·14 mins· loading · loading

Image Classification 🏢 University of Chinese Academy of Sciences

VMamba: a vision backbone achieving linear time complexity using Visual State Space (VSS) blocks and 2D Selective Scan (SS2D) for efficient visual representation.

Visual Pinwheel Center Act as Geometric Saliency Detector

26 September 2024·2189 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Research Institute of Intelligent Complex Systems, Fudan University

Visual pinwheel centers in the cortex act as efficient geometric saliency detectors, responding faster and stronger to complex spatial textures than other structures.

Visual Fourier Prompt Tuning

26 September 2024·4269 words·21 mins· loading · loading

Computer Vision Image Classification 🏢 Rochester Institute of Technology

Visual Fourier Prompt Tuning (VFPT) leverages the Fast Fourier Transform to seamlessly integrate spatial and frequency information for superior parameter-efficient vision model fine-tuning, even with …

Visual Data Diagnosis and Debiasing with Concept Graphs

26 September 2024·2767 words·13 mins· loading · loading

Computer Vision Image Classification 🏢 Carnegie Mellon University

CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…

Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights

26 September 2024·4915 words·24 mins· loading · loading

Computer Vision Image Classification 🏢 Singapore University of Technology and Design (SUTD)

OoD-ViT-NAS: a new benchmark reveals how ViT architecture impacts out-of-distribution generalization, highlighting the importance of embedding dimension and challenging the reliance on in-distribution…

Understanding Visual Feature Reliance through the Lens of Complexity

26 September 2024·3993 words·19 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Google DeepMind

Deep learning models favor simple features, hindering generalization; this paper introduces a new feature complexity metric revealing a spectrum of simple-to-complex features, their learning dynamics,…

Understanding Bias in Large-Scale Visual Datasets

26 September 2024·5190 words·25 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 University of Pennsylvania

Researchers unveil a novel framework to dissect bias in large-scale visual datasets, identifying unique visual attributes and leveraging language models for detailed analysis, paving the way for creat…

Transformer Doctor: Diagnosing and Treating Vision Transformers

26 September 2024·3080 words·15 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 College of Computer Science and Technology, Zhejiang University

Transformer Doctor diagnoses and treats vision transformer errors by identifying and correcting information integration issues, improving model performance and interpretability.

TrAct: Making First-layer Pre-Activations Trainable

26 September 2024·2254 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Stanford University

TrAct boosts vision model training by directly optimizing first-layer activations, leading to significant speedups (1.25x-4x) and improved accuracy.

To Err Like Human: Affective Bias-Inspired Measures for Visual Emotion Recognition Evaluation

26 September 2024·1759 words·9 mins· loading · loading

Computer Vision Image Classification 🏢 Nankai University

This paper introduces novel metrics for visual emotion recognition evaluation, considering the psychological distance between emotions to better reflect human perception, improving the assessment of m…

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

26 September 2024·2874 words·14 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 University of Washington

Using real images retrieved from a generator’s training data outperforms using synthetic images generated by that same model for image classification.

TARP-VP: Towards Evaluation of Transferred Adversarial Robustness and Privacy on Label Mapping Visual Prompting Models

26 September 2024·2161 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 University of Liverpool

TARP-VP reveals a surprising lack of trade-off between adversarial robustness and privacy for label mapping visual prompting models, showing that transferred adversarial training significantly improve…

Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning

26 September 2024·3674 words·18 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Google Research

Structured Unrestricted-Rank Matrices (SURMs) revolutionize parameter-efficient fine-tuning by offering greater flexibility and accuracy than existing methods like LoRA, achieving significant gains in…

Spiking Transformer with Experts Mixture

26 September 2024·2017 words·10 mins· loading · loading

Computer Vision Image Classification 🏢 Peking University

Spiking Experts Mixture Mechanism (SEMM) boosts Spiking Transformers by integrating Mixture-of-Experts for efficient, sparse conditional computation, achieving significant performance improvements on …

Slicing Vision Transformer for Flexibile Inference

26 September 2024·2922 words·14 mins· loading · loading

Computer Vision Image Classification 🏢 Snap Inc.

Scala: One-shot training enables flexible ViT inference!

Scaling White-Box Transformers for Vision

26 September 2024·2209 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 UC Santa Cruz

CRATE-a: A new white-box vision transformer architecture achieves 85.1% ImageNet accuracy by strategically scaling model size and datasets, outperforming prior white-box models and preserving interpre…

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

26 September 2024·3783 words·18 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Intel Labs

ScaleKD: Pre-trained vision transformers make excellent teachers for diverse student networks, improving efficiency and performance in knowledge distillation.

Samba: Severity-aware Recurrent Modeling for Cross-domain Medical Image Grading

26 September 2024·2230 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Westlake University

Samba: a novel severity-aware recurrent model, tackles cross-domain medical image grading by sequentially encoding image patches and recalibrating states using EM, significantly improving accuracy.

Saliency-driven Experience Replay for Continual Learning

26 September 2024·2628 words·13 mins· loading · loading

Image Classification 🏢 University of Catania

Boosting AI’s continual learning via saliency-driven experience replay, achieving up to 20% accuracy improvement.