Posters
2024
The Intelligible and Effective Graph Neural Additive Network
·2248 words·11 mins·
loading
·
loading
AI Theory
Interpretability
š¢ Tel Aviv University
GNAN: a novel interpretable graph neural network achieving accuracy comparable to black-box models.
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
·1878 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
š¢ Carnegie Mellon University
Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.
The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains
·1885 words·9 mins·
loading
·
loading
Machine Learning
Deep Learning
š¢ UC Berkeley
ESCAIP, a novel neural network architecture, dramatically boosts the speed and accuracy of atomic simulations by leveraging attention mechanisms, enabling efficient large-scale modeling across diverse…
The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing
·1551 words·8 mins·
loading
·
loading
AI Theory
Optimization
š¢ Peking University
Leveraging data heterogeneity, this study reveals that standard SGD implicitly learns invariant features across multiple environments, achieving robust generalization without explicit regularization.
The Implicit Bias of Gradient Descent toward Collaboration between Layers: A Dynamic Analysis of Multilayer Perceptions
·1405 words·7 mins·
loading
·
loading
AI Theory
Robustness
š¢ Department of Computer Science University of Exeter
Deep learning models’ success hinges on understanding gradient descent’s implicit bias. This study reveals how this bias influences layer collaboration, revealing a decreasing trend in adversarial rob…
The Implicit Bias of Gradient Descent on Separable Multiclass Data
·1300 words·7 mins·
loading
·
loading
Machine Learning
Deep Learning
š¢ University of Michigan
Researchers extended implicit bias theory to multiclass classification using a novel framework, proving that gradient descent prefers simple solutions even with complex alternatives.
The Implicit Bias of Adam on Separable Data
·1356 words·7 mins·
loading
·
loading
AI Theory
Optimization
š¢ Hong Kong University of Science and Technology
Adam’s implicit bias revealed: On separable data, Adam converges towards the maximum lā-margin solution, a finding contrasting with gradient descent’s l2-margin preference. This polynomial-time conver…
The Impact of Initialization on LoRA Finetuning Dynamics
·2220 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
š¢ UC Berkeley
LoRA’s initialization significantly impacts finetuning; initializing matrix A randomly and B to zero yields better performance than vice-versa due to enabling larger learning rates.
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
·1870 words·9 mins·
loading
·
loading
Machine Learning
Transfer Learning
š¢ Google Research
Lowering a neural network’s geometric complexity during pre-training enhances neural collapse and improves transfer learning, especially in few-shot scenarios.
The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms
·1819 words·9 mins·
loading
·
loading
Machine Learning
Optimization
š¢ McGill University
Researchers developed a framework for analyzing stochastic adaptive learning rate algorithms, providing exact risk and learning rate curves, revealing the importance of data covariance and uncovering …
The Group Robustness is in the Details: Revisiting Finetuning under Spurious Correlations
·2643 words·13 mins·
loading
·
loading
AI Theory
Fairness
š¢ Google DeepMind
Finetuning’s impact on worst-group accuracy is surprisingly nuanced, with common class-balancing methods sometimes hurting performance; a novel mixture method consistently outperforms others.
The GAN is dead; long live the GAN! A Modern GAN Baseline
·3072 words·15 mins·
loading
·
loading
Computer Vision
Image Generation
š¢ Brown University
R3GAN, a minimalist GAN baseline, surpasses state-of-the-art models by using a novel regularized relativistic GAN loss and modern architectures, proving GANs can be trained efficiently without relying…
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
·336 words·2 mins·
loading
·
loading
Natural Language Processing
Large Language Models
š¢ Columbia University
New research precisely defines the computational limits of training large language models, revealing a sharp threshold based on parameter matrix entries, paving the way for faster algorithms.
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks
·1538 words·8 mins·
loading
·
loading
Machine Learning
Deep Learning
š¢ Institute of Mathematics, EPFL
New ‘Feature Speed Formula’ predicts & controls deep learning’s hierarchical feature learning by linking hyperparameter tuning to the angle between feature updates and backward pass.
The Fairness-Quality Tradeoff in Clustering
·2122 words·10 mins·
loading
·
loading
AI Generated
AI Theory
Fairness
š¢ Columbia University
Novel algorithms trace the optimal balance between clustering quality and fairness, revealing all non-dominated solutions for various objectives.
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
·3501 words·17 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
š¢ MIT
Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…
The Expressive Capacity of State Space Models: A Formal Language Perspective
·1723 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
š¢ Saarland University
State-space models (SSMs) rival transformers in language modeling, but their capabilities remain unclear; this paper rigorously analyzes SSM expressivity, revealing unique strengths and limitations, i…
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
·2128 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
š¢ Harvard University
Transformers learn to perform in-context learning of Markov chains hierarchically, progressing from simpler unigram strategies to more complex bigram solutions, with the presence of simpler solutions …
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
·2500 words·12 mins·
loading
·
loading
AI Theory
Optimization
š¢ MIT
Breaking neural network parameter symmetries leads to faster training, better generalization, and improved loss landscape behavior, as demonstrated by novel asymmetric network architectures.
The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning
·2452 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
š¢ University of Oxford
Offline model-based RL methods fail as dynamics models improve; this paper reveals the ’edge-of-reach’ problem causing this and introduces RAVL, a simple solution ensuring robust performance.