AI Theory
Topological obstruction to the training of shallow ReLU neural networks
·1553 words·8 mins·
loading
·
loading
AI Theory
Optimization
π’ Politecnico Di Torino
Shallow ReLU neural networks face topological training obstructions due to gradient flow confinement on disconnected quadric hypersurfaces.
Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms
·8286 words·39 mins·
loading
·
loading
AI Generated
AI Theory
Generalization
π’ University of Edinburgh
New topology-based complexity measures reliably predict deep learning model generalization, outperforming existing methods and offering practical computational efficiency.
Tighter Convergence Bounds for Shuffled SGD via Primal-Dual Perspective
·1717 words·9 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
π’ University of Wisconsin-Madison
Shuffled SGD’s convergence is now better understood through a primal-dual analysis, yielding tighter bounds that align with its superior empirical performance.
Tight Rates for Bandit Control Beyond Quadratics
·406 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
π’ Princeton University
This paper presents an algorithm achieving Γ(βT) optimal regret for bandit non-stochastic control with strongly-convex and smooth cost functions, overcoming prior limitations of suboptimal bounds.
Tight Bounds for Learning RUMs from Small Slates
·255 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
π’ Google Research
Learning user preferences accurately from limited data is key; this paper shows that surprisingly small datasets suffice for precise prediction, and provides efficient algorithms to achieve this.
Theoretical guarantees in KL for Diffusion Flow Matching
·242 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Generalization
π’ Γcole Polytechnique
Novel theoretical guarantees for Diffusion Flow Matching (DFM) models are established, bounding the KL divergence under mild assumptions on data and base distributions.
Theoretical Foundations of Deep Selective State-Space Models
·379 words·2 mins·
loading
·
loading
AI Theory
Generalization
π’ Imperial College London
Deep learning’s sequence modeling is revolutionized by selective state-space models (SSMs)! This paper provides theoretical grounding for their superior performance, revealing the crucial role of gati…
Theoretical Characterisation of the Gauss Newton Conditioning in Neural Networks
·2952 words·14 mins·
loading
·
loading
AI Theory
Optimization
π’ University of Basel
New theoretical bounds reveal how neural network architecture impacts the Gauss-Newton matrix’s conditioning, paving the way for improved optimization.
Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks
·2828 words·14 mins·
loading
·
loading
AI Theory
Fairness
π’ University of California, Los Angeles
Researchers unveil the origins of degree bias in Graph Neural Networks (GNNs), proving high-degree nodes’ lower misclassification probability and proposing methods to alleviate this bias for fairer GN…
Theoretical Analysis of Weak-to-Strong Generalization
·1703 words·8 mins·
loading
·
loading
AI Theory
Generalization
π’ MIT CSAIL
Strong student models can learn from weaker teachers, even correcting errors and generalizing beyond the teacher’s expertise. This paper provides new theoretical bounds explaining this ‘weak-to-strong…
The Surprising Effectiveness of SP Voting with Partial Preferences
·3640 words·18 mins·
loading
·
loading
AI Theory
Optimization
π’ Penn State University
Partial preferences and noisy votes hinder accurate ranking recovery; this paper introduces scalable SP voting variants, empirically demonstrating superior performance in recovering ground truth ranki…
The Space Complexity of Approximating Logistic Loss
·359 words·2 mins·
loading
·
loading
AI Theory
Optimization
π’ LinkedIn Corporation
This paper proves fundamental space complexity lower bounds for approximating logistic loss, revealing that existing coreset constructions are surprisingly optimal.
The Secretary Problem with Predicted Additive Gap
·1651 words·8 mins·
loading
·
loading
AI Theory
Optimization
π’ Institute of Computer Science, University of Bonn
Beat the 1/e barrier in the secretary problem using only an additive gap prediction!
The Sample Complexity of Gradient Descent in Stochastic Convex Optimization
·336 words·2 mins·
loading
·
loading
AI Theory
Optimization
π’ Tel Aviv University
Gradient descent’s sample complexity in non-smooth stochastic convex optimization is Γ(d/m+1/βm), matching worst-case ERMs and showing no advantage over naive methods.
The Reliability of OKRidge Method in Solving Sparse Ridge Regression Problems
·2340 words·11 mins·
loading
·
loading
AI Theory
Optimization
π’ Wuhan University
OKRidge’s reliability for solving sparse ridge regression problems is rigorously proven through theoretical error analysis, enhancing its applicability in machine learning.
The Price of Implicit Bias in Adversarially Robust Generalization
·3000 words·15 mins·
loading
·
loading
AI Generated
AI Theory
Robustness
π’ New York University
Optimization’s implicit bias in robust machine learning hurts generalization; this work reveals how algorithm/architecture choices impact robustness, suggesting better optimization strategies are need…
The Power of Hard Attention Transformers on Data Sequences: A formal language theoretic perspective
·284 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Generalization
π’ RPTU Kaiserslautern-Landau
Hard attention transformers show surprisingly greater power when processing numerical data sequences, exceeding capabilities on string data; this advancement is theoretically analyzed via circuit comp…
The motion planning neural circuit in goal-directed navigation as Lie group operator search
·1385 words·7 mins·
loading
·
loading
AI Theory
Representation Learning
π’ UT Southwestern Medical Center
Neural circuits for goal-directed navigation are modeled as Lie group operator searches, implemented by a two-layer feedforward circuit mimicking Drosophila’s navigation system.
The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels
·215 words·2 mins·
loading
·
loading
AI Theory
Optimization
π’ Karlsruhe Institute of Technology
Researchers found the minimax optimal rate of HSIC estimation for translation-invariant kernels is O(nβ»ΒΉ/Β²), settling a two-decade-old open question and validating many existing HSIC estimators.
The Limits of Differential Privacy in Online Learning
·440 words·3 mins·
loading
·
loading
AI Theory
Privacy
π’ Hong Kong University of Science and Technology
This paper reveals fundamental limits of differential privacy in online learning, demonstrating a clear separation between pure, approximate, and non-private settings.