Image Classification
RTify: Aligning Deep Neural Networks with Human Behavioral Decisions
·1884 words·9 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Brown University
RTify: A novel framework aligns deep neural networks’ dynamics with human reaction times for improved visual decision-making models.
Revisiting the Integration of Convolution and Attention for Vision Backbone
·2197 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 City University of Hong Kong
GLMix: A novel vision backbone efficiently integrates convolutions and multi-head self-attention at different granularities, achieving state-of-the-art performance while addressing scalability issues.
Recurrent neural network dynamical systems for biological vision
·2292 words·11 mins·
loading
·
loading
Image Classification
🏢 University of Cambridge
CordsNet: a hybrid CNN-RNN architecture enabling biologically realistic, robust image recognition through continuous-time recurrent dynamics.
Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices
·1912 words·9 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 University of Georgia
ECP-ViT: Real-time Vision Transformer on Mobile Devices via Core-Periphery Attention and Smart Data Layout.
RAMP: Boosting Adversarial Robustness Against Multiple $l_p$ Perturbations for Universal Robustness
·3379 words·16 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 University of Illinois Urbana-Champaign
RAMP: A novel training framework significantly boosts DNN robustness against diverse adversarial attacks by mitigating accuracy-robustness tradeoffs and improving generalization.
QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model
·2714 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Shanghai Jiao Tong University
QuadMamba: A novel vision model leveraging quadtree-based scanning for superior performance in visual tasks, achieving state-of-the-art results with linear-time complexity.
QT-ViT: Improving Linear Attention in ViT with Quadratic Taylor Expansion
·1611 words·8 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Advanced Micro Devices, Inc.
QT-ViT boosts Vision Transformer efficiency by using quadratic Taylor expansion to approximate self-attention, achieving state-of-the-art accuracy and speed.
QKFormer: Hierarchical Spiking Transformer using Q-K Attention
·2062 words·10 mins·
loading
·
loading
Image Classification
🏢 Pengcheng Laboratory
QKFormer: A groundbreaking spiking transformer achieving 85.65% ImageNet accuracy using a linear-complexity, energy-efficient Q-K attention mechanism.
Provable Benefit of Cutout and CutMix for Feature Learning
·1796 words·9 mins·
loading
·
loading
Image Classification
🏢 KAIST
CutMix and Cutout data augmentation methods provably improve feature learning by enabling the network to learn rarer features and noise vectors more effectively.
Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery
·3085 words·15 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 University of Trento
Prototypical Hash Encoding (PHE) significantly boosts on-the-fly fine-grained category discovery by using multiple prototypes per category to generate highly discriminative hash codes, thus resolving …
Physics-Constrained Comprehensive Optical Neural Networks
·1493 words·8 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Beijing University of Posts and Telecommunications
Physics-constrained learning significantly boosts optical neural network accuracy by addressing systematic physical errors, achieving state-of-the-art results on image classification tasks.
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
·3763 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 University of Melbourne
BiXT, a novel bi-directional cross-attention Transformer, scales linearly with input size, achieving competitive performance across various tasks by efficiently processing longer sequences.
On the Use of Anchoring for Training Vision Models
·1917 words·9 mins·
loading
·
loading
Image Classification
🏢 Lawrence Livermore National Laboratory
Boosting vision model training: A new anchored training protocol with a simple regularizer significantly enhances generalization and safety, surpassing standard methods.
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
·2971 words·14 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Carnegie Mellon University
Vision Transformers achieve surprisingly high accuracy by transferring only pre-training attention maps, challenging the conventional belief that feature learning is crucial.
Not Just Object, But State: Compositional Incremental Learning without Forgetting
·2423 words·12 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Dalian University of Technology
CompILer: A novel prompt-based incremental learner mastering state-object compositions without forgetting, achieving state-of-the-art performance.
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
·3800 words·18 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 QUVA Lab, University of Amsterdam
Self-supervised gradients boost frozen deep learning model performance!
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
·2122 words·10 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 City University of Hong Kong
MSVMamba: A novel multi-scale vision model leveraging state-space models, achieves high accuracy in image classification and object detection while maintaining linear complexity, solving the long-rang…
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
·2539 words·12 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Tsinghua University
MSPE empowers Vision Transformers to handle any image resolution by cleverly optimizing patch embedding, achieving superior performance on low-resolution images and comparable results on high-resoluti…
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
·2159 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Huazhong University of Science and Technology
MoE Jetpack efficiently transforms readily available dense checkpoints into high-performing MoE models, drastically accelerating convergence and improving accuracy.
Mitigating Biases in Blackbox Feature Extractors for Image Classification Tasks
·2583 words·13 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Indian Institute of Science
Researchers propose a simple yet effective clustering-based adaptive margin loss to mitigate biases inherited by black-box feature extractors in image classification tasks.