Generalization

When are dynamical systems learned from time series data statistically accurate?

26 September 2024·2869 words·14 mins· loading · loading

AI Theory Generalization 🏢 University of Chicago

Learned dynamical systems often fail to capture true physical behavior; this work introduces an ergodic theoretic approach to improve statistical accuracy by incorporating Jacobian information during …

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

26 September 2024·1911 words·9 mins· loading · loading

AI Generated AI Theory Generalization 🏢 Peking University

This work systematically investigates the approximation properties of Transformer networks for sequence modeling, revealing the distinct roles of key components (self-attention, positional encoding, f…

Transformation-Invariant Learning and Theoretical Guarantees for OOD Generalization

26 September 2024·541 words·3 mins· loading · loading

AI Theory Generalization 🏢 Yale University

This paper introduces a novel theoretical framework for robust machine learning under distribution shifts, offering learning rules and guarantees, highlighting the game-theoretic viewpoint of distribu…

Transcendence: Generative Models Can Outperform The Experts That Train Them

26 September 2024·2384 words·12 mins· loading · loading

AI Theory Generalization 🏢 OpenAI

Generative models can outperform their human trainers: A groundbreaking study shows how autoregressive transformers, trained on chess game data, can achieve higher game ratings than any of the human …

Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

26 September 2024·8286 words·39 mins· loading · loading

AI Generated AI Theory Generalization 🏢 University of Edinburgh

New topology-based complexity measures reliably predict deep learning model generalization, outperforming existing methods and offering practical computational efficiency.

Theoretical guarantees in KL for Diffusion Flow Matching

26 September 2024·242 words·2 mins· loading · loading

AI Generated AI Theory Generalization 🏢 École Polytechnique

Novel theoretical guarantees for Diffusion Flow Matching (DFM) models are established, bounding the KL divergence under mild assumptions on data and base distributions.

Theoretical Foundations of Deep Selective State-Space Models

26 September 2024·379 words·2 mins· loading · loading

AI Theory Generalization 🏢 Imperial College London

Deep learning’s sequence modeling is revolutionized by selective state-space models (SSMs)! This paper provides theoretical grounding for their superior performance, revealing the crucial role of gati…

Theoretical Analysis of Weak-to-Strong Generalization

26 September 2024·1703 words·8 mins· loading · loading

AI Theory Generalization 🏢 MIT CSAIL

Strong student models can learn from weaker teachers, even correcting errors and generalizing beyond the teacher’s expertise. This paper provides new theoretical bounds explaining this ‘weak-to-strong…

The Power of Hard Attention Transformers on Data Sequences: A formal language theoretic perspective

26 September 2024·284 words·2 mins· loading · loading

AI Generated AI Theory Generalization 🏢 RPTU Kaiserslautern-Landau

Hard attention transformers show surprisingly greater power when processing numerical data sequences, exceeding capabilities on string data; this advancement is theoretically analyzed via circuit comp…

Testably Learning Polynomial Threshold Functions

26 September 2024·248 words·2 mins· loading · loading

AI Generated AI Theory Generalization 🏢 ETH Zurich

Testably learning polynomial threshold functions efficiently, matching agnostic learning’s best guarantees, is achieved, solving a key problem in robust machine learning.

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

26 September 2024·2167 words·11 mins· loading · loading

AI Theory Generalization 🏢 University of California, San Diego

Deep ReLU networks trained with large, constant learning rates avoid overfitting in univariate regression due to minima stability, generalizing well even with noisy labels.

Stability and Generalization of Asynchronous SGD: Sharper Bounds Beyond Lipschitz and Smoothness

26 September 2024·1414 words·7 mins· loading · loading

AI Theory Generalization 🏢 National University of Defense Technology

Sharper ASGD generalization bounds achieved by leveraging on-average model stability, even without Lipschitz and smoothness assumptions; validated with diverse machine learning models.

Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

26 September 2024·1518 words·8 mins· loading · loading

AI Theory Generalization 🏢 Courant Institute

Sketchy Moment Matching (SkMM) is a fast and theoretically sound data selection method for deep learning finetuning. By controlling variance-bias tradeoffs in high dimensions, SkMM drastically reduces…

Provable Tempered Overfitting of Minimal Nets and Typical Nets

26 September 2024·1386 words·7 mins· loading · loading

AI Theory Generalization 🏢 Technion

Deep learning’s generalization ability defies conventional wisdom; this paper proves that overfitting in deep neural networks is ’tempered’, neither catastrophic nor perfectly benign, for both minimal…

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure

26 September 2024·15685 words·74 mins· loading · loading

AI Generated AI Theory Generalization 🏢 Google Research

Position coupling, a novel method, enhances the length generalization ability of arithmetic Transformers by directly embedding task structures into positional encodings. This simple technique enables…

Partial Transportability for Domain Generalization

26 September 2024·2485 words·12 mins· loading · loading

AI Theory Generalization 🏢 Columbia University

This paper introduces a novel technique to bound prediction risks in new domains using causal diagrams, enabling reliable AI performance guarantees.

Partial observation can induce mechanistic mismatches in data-constrained models of neural dynamics

26 September 2024·1877 words·9 mins· loading · loading

AI Theory Generalization 🏢 Harvard University

Partially observing neural circuits during experiments can create misleading models, even if single neuron activity matches; researchers need better validation methods.

PAC-Bayes-Chernoff bounds for unbounded losses

26 September 2024·358 words·2 mins· loading · loading

AI Theory Generalization 🏢 Basque Center for Applied Mathematics (BCAM)

New PAC-Bayes oracle bound extends Cramér-Chernoff to unbounded losses, enabling exact parameter optimization and richer assumptions for tighter generalization bounds.

Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality

26 September 2024·1931 words·10 mins· loading · loading

AI Generated AI Theory Generalization 🏢 University of Chicago

Ridgeless regression, surprisingly, generalizes well even with noisy data if dimension scales sub-polynomially with sample size.

On the Saturation Effects of Spectral Algorithms in Large Dimensions

26 September 2024·1464 words·7 mins· loading · loading

AI Theory Generalization 🏢 Tsinghua University

High-dimensional spectral algorithms show saturation effects: Kernel Ridge Regression underperforms optimal algorithms like gradient flow when regression functions are very smooth.