AI Theory
ΟP^2: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling
·9260 words·44 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
π’ University of TΓΌbingen
Β΅PΒ²: Layerwise perturbation scaling in SAM enables hyperparameter transfer and improved generalization in large models.
Zipper: Addressing Degeneracy in Algorithm-Agnostic Inference
·1827 words·9 mins·
loading
·
loading
AI Theory
Interpretability
π’ Nankai University
Zipper: A novel statistical device resolves the degeneracy issue in algorithm-agnostic inference, enabling reliable goodness-of-fit tests with enhanced power.
Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion
·2790 words·14 mins·
loading
·
loading
AI Theory
Sampling
π’ Georgia Institute of Technology
Zeroth-Order Diffusion Monte Carlo (ZOD-MC) efficiently samples from non-log-concave distributions using only zeroth-order queries, overcoming metastability issues and outperforming state-of-the-art s…
Wide Two-Layer Networks can Learn from Adversarial Perturbations
·2045 words·10 mins·
loading
·
loading
AI Theory
Robustness
π’ University of Tokyo
Wide two-layer neural networks can generalize well from mislabeled adversarial examples because adversarial perturbations surprisingly contain sufficient class-specific features.
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
·7149 words·34 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
π’ University of Maryland
Deep learning’s learning rate warmup improves performance by allowing larger learning rates, pushing networks to better-conditioned loss landscape areas.
Why Transformers Need Adam: A Hessian Perspective
·2407 words·12 mins·
loading
·
loading
AI Theory
Optimization
π’ Chinese University of Hong Kong, Shenzhen, China
Adam’s superiority over SGD in Transformer training is explained by the ‘block heterogeneity’ of the Hessian matrix, highlighting the need for adaptive learning rates.
Why the Metric Backbone Preserves Community Structure
·2073 words·10 mins·
loading
·
loading
AI Theory
Optimization
π’ EPFL
Metric backbone graph sparsification surprisingly preserves community structure, offering an efficient and robust method for analyzing large networks.
Why Do We Need Weight Decay in Modern Deep Learning?
·3285 words·16 mins·
loading
·
loading
AI Theory
Optimization
π’ EPFL
Weight decay’s role in modern deep learning is surprisingly multifaceted, impacting optimization dynamics rather than solely regularization, improving generalization and training stability.
Where Do Large Learning Rates Lead Us?
·5231 words·25 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
π’ Constructor University
Unlocking optimal neural network training: A narrow range of initially high learning rates, slightly above the convergence threshold, consistently yields superior generalization after fine-tuning.
When to Act and When to Ask: Policy Learning With Deferral Under Hidden Confounding
·1568 words·8 mins·
loading
·
loading
AI Theory
Causality
π’ Faculty of Data and Decision Sciences, Technion
CARED: a novel causal action recommendation model improves policy learning by collaborating with human experts and mitigating hidden confounding in observational data.
When is Multicalibration Post-Processing Necessary?
·10662 words·51 mins·
loading
·
loading
AI Generated
AI Theory
Fairness
π’ University of Southern California
Multicalibration post-processing isn’t always necessary; models often implicitly achieve it, especially calibrated ones. For uncalibrated models, though, it significantly improves fairness.
When Is Inductive Inference Possible?
·1470 words·7 mins·
loading
·
loading
AI Theory
Optimization
π’ Princeton University
This paper provides a tight characterization of inductive inference, proving it’s possible if and only if the hypothesis class is a countable union of online learnable classes, resolving a long-standi…
When is an Embedding Model More Promising than Another?
·4115 words·20 mins·
loading
·
loading
AI Theory
Representation Learning
π’ Mila - Quebec AI Institute
This paper introduces a novel, task-agnostic method for ranking embedding models using information sufficiency, a concept derived from communication theory and statistical experiments comparison, demo…
When are dynamical systems learned from time series data statistically accurate?
·2869 words·14 mins·
loading
·
loading
AI Theory
Generalization
π’ University of Chicago
Learned dynamical systems often fail to capture true physical behavior; this work introduces an ergodic theoretic approach to improve statistical accuracy by incorporating Jacobian information during …
What type of inference is planning?
·1424 words·7 mins·
loading
·
loading
AI Theory
Optimization
π’ Google DeepMind
Planning is redefined as a distinct inference type within a variational framework, enabling efficient approximate planning in complex environments.
What makes unlearning hard and what to do about it
·5453 words·26 mins·
loading
·
loading
AI Theory
Interpretability
π’ University of Warwick
Researchers developed RUM, a refined unlearning meta-algorithm, that significantly improves existing unlearning methods by strategically refining forget sets and employing appropriate unlearning algor…
What Is Missing For Graph Homophily? Disentangling Graph Homophily For Graph Neural Networks
·2555 words·12 mins·
loading
·
loading
AI Generated
AI Theory
Representation Learning
π’ Nanyang Technological University
Tri-Hom disentangles graph homophily into label, structural, and feature aspects, providing a more comprehensive and accurate metric for predicting GNN performance.
What does guidance do? A fine-grained analysis in a simple setting
·3498 words·17 mins·
loading
·
loading
AI Theory
Optimization
π’ Duke University
Diffusion guidance, a common generative modeling technique, is shown to not sample from its intended distribution; instead, it heavily biases samples towards the boundary of the conditional distributi…
What do Graph Neural Networks learn? Insights from Tropical Geometry
·1465 words·7 mins·
loading
·
loading
AI Theory
Representation Learning
π’ University of Edinburgh
Using tropical geometry, researchers reveal that ReLU-activated message-passing GNNs learn continuous piecewise linear functions, highlighting their expressivity limits and paving the way for enhanced…
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning
·3118 words·15 mins·
loading
·
loading
AI Theory
Representation Learning
π’ Munich Center for Machine Learning
This paper introduces r-lWL, a new graph isomorphism test hierarchy that surpasses the limitations of the Weisfeiler-Leman test by counting cycles up to length r+2, and its GNN counterpart, r-lMPNN, w…