AI Theory

πP^2: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling

26 September 2024·9260 words·44 mins· loading · loading

AI Generated AI Theory Optimization 🏢 University of Tübingen

µP²: Layerwise perturbation scaling in SAM enables hyperparameter transfer and improved generalization in large models.

Zipper: Addressing Degeneracy in Algorithm-Agnostic Inference

26 September 2024·1827 words·9 mins· loading · loading

AI Theory Interpretability 🏢 Nankai University

Zipper: A novel statistical device resolves the degeneracy issue in algorithm-agnostic inference, enabling reliable goodness-of-fit tests with enhanced power.

Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion

26 September 2024·2790 words·14 mins· loading · loading

AI Theory Sampling 🏢 Georgia Institute of Technology

Zeroth-Order Diffusion Monte Carlo (ZOD-MC) efficiently samples from non-log-concave distributions using only zeroth-order queries, overcoming metastability issues and outperforming state-of-the-art s…

Wide Two-Layer Networks can Learn from Adversarial Perturbations

26 September 2024·2045 words·10 mins· loading · loading

AI Theory Robustness 🏢 University of Tokyo

Wide two-layer neural networks can generalize well from mislabeled adversarial examples because adversarial perturbations surprisingly contain sufficient class-specific features.

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements

26 September 2024·7149 words·34 mins· loading · loading

AI Generated AI Theory Optimization 🏢 University of Maryland

Deep learning’s learning rate warmup improves performance by allowing larger learning rates, pushing networks to better-conditioned loss landscape areas.

Why Transformers Need Adam: A Hessian Perspective

26 September 2024·2407 words·12 mins· loading · loading

AI Theory Optimization 🏢 Chinese University of Hong Kong, Shenzhen, China

Adam’s superiority over SGD in Transformer training is explained by the ‘block heterogeneity’ of the Hessian matrix, highlighting the need for adaptive learning rates.

Why the Metric Backbone Preserves Community Structure

26 September 2024·2073 words·10 mins· loading · loading

AI Theory Optimization 🏢 EPFL

Metric backbone graph sparsification surprisingly preserves community structure, offering an efficient and robust method for analyzing large networks.

Why Do We Need Weight Decay in Modern Deep Learning?

26 September 2024·3285 words·16 mins· loading · loading

AI Theory Optimization 🏢 EPFL

Weight decay’s role in modern deep learning is surprisingly multifaceted, impacting optimization dynamics rather than solely regularization, improving generalization and training stability.

Where Do Large Learning Rates Lead Us?

26 September 2024·5231 words·25 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Constructor University

Unlocking optimal neural network training: A narrow range of initially high learning rates, slightly above the convergence threshold, consistently yields superior generalization after fine-tuning.

When to Act and When to Ask: Policy Learning With Deferral Under Hidden Confounding

26 September 2024·1568 words·8 mins· loading · loading

AI Theory Causality 🏢 Faculty of Data and Decision Sciences, Technion

CARED: a novel causal action recommendation model improves policy learning by collaborating with human experts and mitigating hidden confounding in observational data.

When is Multicalibration Post-Processing Necessary?

26 September 2024·10662 words·51 mins· loading · loading

AI Generated AI Theory Fairness 🏢 University of Southern California

Multicalibration post-processing isn’t always necessary; models often implicitly achieve it, especially calibrated ones. For uncalibrated models, though, it significantly improves fairness.

When Is Inductive Inference Possible?

26 September 2024·1470 words·7 mins· loading · loading

AI Theory Optimization 🏢 Princeton University

This paper provides a tight characterization of inductive inference, proving it’s possible if and only if the hypothesis class is a countable union of online learnable classes, resolving a long-standi…

When is an Embedding Model More Promising than Another?

26 September 2024·4115 words·20 mins· loading · loading

AI Theory Representation Learning 🏢 Mila - Quebec AI Institute

This paper introduces a novel, task-agnostic method for ranking embedding models using information sufficiency, a concept derived from communication theory and statistical experiments comparison, demo…

When are dynamical systems learned from time series data statistically accurate?

26 September 2024·2869 words·14 mins· loading · loading

AI Theory Generalization 🏢 University of Chicago

Learned dynamical systems often fail to capture true physical behavior; this work introduces an ergodic theoretic approach to improve statistical accuracy by incorporating Jacobian information during …

What type of inference is planning?

26 September 2024·1424 words·7 mins· loading · loading

AI Theory Optimization 🏢 Google DeepMind

Planning is redefined as a distinct inference type within a variational framework, enabling efficient approximate planning in complex environments.

What makes unlearning hard and what to do about it

26 September 2024·5453 words·26 mins· loading · loading

AI Theory Interpretability 🏢 University of Warwick

Researchers developed RUM, a refined unlearning meta-algorithm, that significantly improves existing unlearning methods by strategically refining forget sets and employing appropriate unlearning algor…

What Is Missing For Graph Homophily? Disentangling Graph Homophily For Graph Neural Networks

26 September 2024·2555 words·12 mins· loading · loading

AI Generated AI Theory Representation Learning 🏢 Nanyang Technological University

Tri-Hom disentangles graph homophily into label, structural, and feature aspects, providing a more comprehensive and accurate metric for predicting GNN performance.

What does guidance do? A fine-grained analysis in a simple setting

26 September 2024·3498 words·17 mins· loading · loading

AI Theory Optimization 🏢 Duke University

Diffusion guidance, a common generative modeling technique, is shown to not sample from its intended distribution; instead, it heavily biases samples towards the boundary of the conditional distributi…

What do Graph Neural Networks learn? Insights from Tropical Geometry

26 September 2024·1465 words·7 mins· loading · loading

AI Theory Representation Learning 🏢 University of Edinburgh

Using tropical geometry, researchers reveal that ReLU-activated message-passing GNNs learn continuous piecewise linear functions, highlighting their expressivity limits and paving the way for enhanced…

Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning

26 September 2024·3118 words·15 mins· loading · loading

AI Theory Representation Learning 🏢 Munich Center for Machine Learning

This paper introduces r-lWL, a new graph isomorphism test hierarchy that surpasses the limitations of the Weisfeiler-Leman test by counting cycles up to length r+2, and its GNN counterpart, r-lMPNN, w…