Skip to main content

Generalization

When are dynamical systems learned from time series data statistically accurate?
·2869 words·14 mins· loading · loading
AI Theory Generalization 🏒 University of Chicago
Learned dynamical systems often fail to capture true physical behavior; this work introduces an ergodic theoretic approach to improve statistical accuracy by incorporating Jacobian information during …
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling
·1911 words·9 mins· loading · loading
AI Generated AI Theory Generalization 🏒 Peking University
This work systematically investigates the approximation properties of Transformer networks for sequence modeling, revealing the distinct roles of key components (self-attention, positional encoding, f…
Transformation-Invariant Learning and Theoretical Guarantees for OOD Generalization
·541 words·3 mins· loading · loading
AI Theory Generalization 🏒 Yale University
This paper introduces a novel theoretical framework for robust machine learning under distribution shifts, offering learning rules and guarantees, highlighting the game-theoretic viewpoint of distribu…
Transcendence: Generative Models Can Outperform The Experts That Train Them
·2384 words·12 mins· loading · loading
AI Theory Generalization 🏒 OpenAI
Generative models can outperform their human trainers: A groundbreaking study shows how autoregressive transformers, trained on chess game data, can achieve higher game ratings than any of the human …
Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms
·8286 words·39 mins· loading · loading
AI Generated AI Theory Generalization 🏒 University of Edinburgh
New topology-based complexity measures reliably predict deep learning model generalization, outperforming existing methods and offering practical computational efficiency.
Theoretical guarantees in KL for Diffusion Flow Matching
·242 words·2 mins· loading · loading
AI Generated AI Theory Generalization 🏒 Γ‰cole Polytechnique
Novel theoretical guarantees for Diffusion Flow Matching (DFM) models are established, bounding the KL divergence under mild assumptions on data and base distributions.
Theoretical Foundations of Deep Selective State-Space Models
·379 words·2 mins· loading · loading
AI Theory Generalization 🏒 Imperial College London
Deep learning’s sequence modeling is revolutionized by selective state-space models (SSMs)! This paper provides theoretical grounding for their superior performance, revealing the crucial role of gati…
Theoretical Analysis of Weak-to-Strong Generalization
·1703 words·8 mins· loading · loading
AI Theory Generalization 🏒 MIT CSAIL
Strong student models can learn from weaker teachers, even correcting errors and generalizing beyond the teacher’s expertise. This paper provides new theoretical bounds explaining this ‘weak-to-strong…
The Power of Hard Attention Transformers on Data Sequences: A formal language theoretic perspective
·284 words·2 mins· loading · loading
AI Generated AI Theory Generalization 🏒 RPTU Kaiserslautern-Landau
Hard attention transformers show surprisingly greater power when processing numerical data sequences, exceeding capabilities on string data; this advancement is theoretically analyzed via circuit comp…
Testably Learning Polynomial Threshold Functions
·248 words·2 mins· loading · loading
AI Generated AI Theory Generalization 🏒 ETH Zurich
Testably learning polynomial threshold functions efficiently, matching agnostic learning’s best guarantees, is achieved, solving a key problem in robust machine learning.
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
·2167 words·11 mins· loading · loading
AI Theory Generalization 🏒 University of California, San Diego
Deep ReLU networks trained with large, constant learning rates avoid overfitting in univariate regression due to minima stability, generalizing well even with noisy labels.
Stability and Generalization of Asynchronous SGD: Sharper Bounds Beyond Lipschitz and Smoothness
·1414 words·7 mins· loading · loading
AI Theory Generalization 🏒 National University of Defense Technology
Sharper ASGD generalization bounds achieved by leveraging on-average model stability, even without Lipschitz and smoothness assumptions; validated with diverse machine learning models.
Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
·1518 words·8 mins· loading · loading
AI Theory Generalization 🏒 Courant Institute
Sketchy Moment Matching (SkMM) is a fast and theoretically sound data selection method for deep learning finetuning. By controlling variance-bias tradeoffs in high dimensions, SkMM drastically reduces…
Provable Tempered Overfitting of Minimal Nets and Typical Nets
·1386 words·7 mins· loading · loading
AI Theory Generalization 🏒 Technion
Deep learning’s generalization ability defies conventional wisdom; this paper proves that overfitting in deep neural networks is ’tempered’, neither catastrophic nor perfectly benign, for both minimal…
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
·15685 words·74 mins· loading · loading
AI Generated AI Theory Generalization 🏒 Google Research
Position coupling, a novel method, enhances the length generalization ability of arithmetic Transformers by directly embedding task structures into positional encodings. This simple technique enables…
Partial Transportability for Domain Generalization
·2485 words·12 mins· loading · loading
AI Theory Generalization 🏒 Columbia University
This paper introduces a novel technique to bound prediction risks in new domains using causal diagrams, enabling reliable AI performance guarantees.
Partial observation can induce mechanistic mismatches in data-constrained models of neural dynamics
·1877 words·9 mins· loading · loading
AI Theory Generalization 🏒 Harvard University
Partially observing neural circuits during experiments can create misleading models, even if single neuron activity matches; researchers need better validation methods.
PAC-Bayes-Chernoff bounds for unbounded losses
·358 words·2 mins· loading · loading
AI Theory Generalization 🏒 Basque Center for Applied Mathematics (BCAM)
New PAC-Bayes oracle bound extends CramΓ©r-Chernoff to unbounded losses, enabling exact parameter optimization and richer assumptions for tighter generalization bounds.
Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
·1931 words·10 mins· loading · loading
AI Generated AI Theory Generalization 🏒 University of Chicago
Ridgeless regression, surprisingly, generalizes well even with noisy data if dimension scales sub-polynomially with sample size.
On the Saturation Effects of Spectral Algorithms in Large Dimensions
·1464 words·7 mins· loading · loading
AI Theory Generalization 🏒 Tsinghua University
High-dimensional spectral algorithms show saturation effects: Kernel Ridge Regression underperforms optimal algorithms like gradient flow when regression functions are very smooth.