Posters

The Intelligible and Effective Graph Neural Additive Network

26 September 2024·2248 words·11 mins· loading · loading

AI Theory Interpretability 🏢 Tel Aviv University

GNAN: a novel interpretable graph neural network achieving accuracy comparable to black-box models.

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

26 September 2024·1878 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.

The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains

26 September 2024·1885 words·9 mins· loading · loading

Machine Learning Deep Learning 🏢 UC Berkeley

ESCAIP, a novel neural network architecture, dramatically boosts the speed and accuracy of atomic simulations by leveraging attention mechanisms, enabling efficient large-scale modeling across diverse…

The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing

26 September 2024·1551 words·8 mins· loading · loading

AI Theory Optimization 🏢 Peking University

Leveraging data heterogeneity, this study reveals that standard SGD implicitly learns invariant features across multiple environments, achieving robust generalization without explicit regularization.

The Implicit Bias of Gradient Descent toward Collaboration between Layers: A Dynamic Analysis of Multilayer Perceptions

26 September 2024·1405 words·7 mins· loading · loading

AI Theory Robustness 🏢 Department of Computer Science University of Exeter

Deep learning models’ success hinges on understanding gradient descent’s implicit bias. This study reveals how this bias influences layer collaboration, revealing a decreasing trend in adversarial rob…

The Implicit Bias of Gradient Descent on Separable Multiclass Data

26 September 2024·1300 words·7 mins· loading · loading

Machine Learning Deep Learning 🏢 University of Michigan

Researchers extended implicit bias theory to multiclass classification using a novel framework, proving that gradient descent prefers simple solutions even with complex alternatives.

The Implicit Bias of Adam on Separable Data

26 September 2024·1356 words·7 mins· loading · loading

AI Theory Optimization 🏢 Hong Kong University of Science and Technology

Adam’s implicit bias revealed: On separable data, Adam converges towards the maximum l∞-margin solution, a finding contrasting with gradient descent’s l2-margin preference. This polynomial-time conver…

The Impact of Initialization on LoRA Finetuning Dynamics

26 September 2024·2220 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LoRA’s initialization significantly impacts finetuning; initializing matrix A randomly and B to zero yields better performance than vice-versa due to enabling larger learning rates.

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning

26 September 2024·1870 words·9 mins· loading · loading

Machine Learning Transfer Learning 🏢 Google Research

Lowering a neural network’s geometric complexity during pre-training enhances neural collapse and improves transfer learning, especially in few-shot scenarios.

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

26 September 2024·1819 words·9 mins· loading · loading

Machine Learning Optimization 🏢 McGill University

Researchers developed a framework for analyzing stochastic adaptive learning rate algorithms, providing exact risk and learning rate curves, revealing the importance of data covariance and uncovering …

The Group Robustness is in the Details: Revisiting Finetuning under Spurious Correlations

26 September 2024·2643 words·13 mins· loading · loading

AI Theory Fairness 🏢 Google DeepMind

Finetuning’s impact on worst-group accuracy is surprisingly nuanced, with common class-balancing methods sometimes hurting performance; a novel mixture method consistently outperforms others.

The GAN is dead; long live the GAN! A Modern GAN Baseline

26 September 2024·3072 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Brown University

R3GAN, a minimalist GAN baseline, surpasses state-of-the-art models by using a novel regularized relativistic GAN loss and modern architectures, proving GANs can be trained efficiently without relying…

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

26 September 2024·336 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Columbia University

New research precisely defines the computational limits of training large language models, revealing a sharp threshold based on parameter matrix entries, paving the way for faster algorithms.

The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

26 September 2024·1538 words·8 mins· loading · loading

Machine Learning Deep Learning 🏢 Institute of Mathematics, EPFL

New ‘Feature Speed Formula’ predicts & controls deep learning’s hierarchical feature learning by linking hyperparameter tuning to the angle between feature updates and backward pass.

The Fairness-Quality Tradeoff in Clustering

26 September 2024·2122 words·10 mins· loading · loading

AI Generated AI Theory Fairness 🏢 Columbia University

Novel algorithms trace the optimal balance between clustering quality and fairness, revealing all non-dominated solutions for various objectives.

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

26 September 2024·3501 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT

Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…

The Expressive Capacity of State Space Models: A Formal Language Perspective

26 September 2024·1723 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Saarland University

State-space models (SSMs) rival transformers in language modeling, but their capabilities remain unclear; this paper rigorously analyzes SSM expressivity, revealing unique strengths and limitations, i…

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

26 September 2024·2128 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harvard University

Transformers learn to perform in-context learning of Markov chains hierarchically, progressing from simpler unigram strategies to more complex bigram solutions, with the presence of simpler solutions …

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

26 September 2024·2500 words·12 mins· loading · loading

AI Theory Optimization 🏢 MIT

Breaking neural network parameter symmetries leads to faster training, better generalization, and improved loss landscape behavior, as demonstrated by novel asymmetric network architectures.

The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning

26 September 2024·2452 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Oxford

Offline model-based RL methods fail as dynamics models improve; this paper reveals the ’edge-of-reach’ problem causing this and introduces RAVL, a simple solution ensuring robust performance.