Optimization

πP^2: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling

26 September 2024·9260 words·44 mins· loading · loading

AI Generated AI Theory Optimization 🏢 University of Tübingen

µP²: Layerwise perturbation scaling in SAM enables hyperparameter transfer and improved generalization in large models.

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements

26 September 2024·7149 words·34 mins· loading · loading

AI Generated AI Theory Optimization 🏢 University of Maryland

Deep learning’s learning rate warmup improves performance by allowing larger learning rates, pushing networks to better-conditioned loss landscape areas.

Why Transformers Need Adam: A Hessian Perspective

26 September 2024·2407 words·12 mins· loading · loading

AI Theory Optimization 🏢 Chinese University of Hong Kong, Shenzhen, China

Adam’s superiority over SGD in Transformer training is explained by the ‘block heterogeneity’ of the Hessian matrix, highlighting the need for adaptive learning rates.

Why the Metric Backbone Preserves Community Structure

26 September 2024·2073 words·10 mins· loading · loading

AI Theory Optimization 🏢 EPFL

Metric backbone graph sparsification surprisingly preserves community structure, offering an efficient and robust method for analyzing large networks.

Why Do We Need Weight Decay in Modern Deep Learning?

26 September 2024·3285 words·16 mins· loading · loading

AI Theory Optimization 🏢 EPFL

Weight decay’s role in modern deep learning is surprisingly multifaceted, impacting optimization dynamics rather than solely regularization, improving generalization and training stability.

Where Do Large Learning Rates Lead Us?

26 September 2024·5231 words·25 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Constructor University

Unlocking optimal neural network training: A narrow range of initially high learning rates, slightly above the convergence threshold, consistently yields superior generalization after fine-tuning.

When Is Inductive Inference Possible?

26 September 2024·1470 words·7 mins· loading · loading

AI Theory Optimization 🏢 Princeton University

This paper provides a tight characterization of inductive inference, proving it’s possible if and only if the hypothesis class is a countable union of online learnable classes, resolving a long-standi…

What type of inference is planning?

26 September 2024·1424 words·7 mins· loading · loading

AI Theory Optimization 🏢 Google DeepMind

Planning is redefined as a distinct inference type within a variational framework, enabling efficient approximate planning in complex environments.

What does guidance do? A fine-grained analysis in a simple setting

26 September 2024·3498 words·17 mins· loading · loading

AI Theory Optimization 🏢 Duke University

Diffusion guidance, a common generative modeling technique, is shown to not sample from its intended distribution; instead, it heavily biases samples towards the boundary of the conditional distributi…

Warm-starting Push-Relabel

26 September 2024·1936 words·10 mins· loading · loading

AI Theory Optimization 🏢 UC Berkeley

This research introduces the first theoretical guarantees for warm-starting the celebrated Push-Relabel network flow algorithm, improving its speed using a predicted flow, while maintaining worst-case…

Variance estimation in compound decision theory under boundedness

26 September 2024·323 words·2 mins· loading · loading

AI Theory Optimization 🏢 University of Chicago

Unlocking the optimal variance estimation rate in compound decision theory under bounded means, this paper reveals a surprising (log log n/log n)² rate and introduces a rate-optimal cumulant-based est…

Validating Climate Models with Spherical Convolutional Wasserstein Distance

26 September 2024·2133 words·11 mins· loading · loading

AI Theory Optimization 🏢 University of Illinois Urbana-Champaign

Researchers developed Spherical Convolutional Wasserstein Distance (SCWD) to more accurately validate climate models by considering spatial variability and local distributional differences.

User-Creator Feature Polarization in Recommender Systems with Dual Influence

26 September 2024·2172 words·11 mins· loading · loading

AI Theory Optimization 🏢 Harvard University

Recommender systems, when influenced by both users and creators, inevitably polarize; however, prioritizing efficiency through methods like top-k truncation can surprisingly enhance diversity.

Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms

26 September 2024·1440 words·7 mins· loading · loading

AI Theory Optimization 🏢 University of Virginia

Recommendation algorithms on UGC platforms face a critical trade-off: prioritizing user satisfaction reduces creator engagement, jeopardizing long-term content diversity. This research introduces a ga…

Unrolled denoising networks provably learn to perform optimal Bayesian inference

26 September 2024·2411 words·12 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Harvard University

Unrolled neural networks, trained via gradient descent, provably achieve optimal Bayesian inference for compressed sensing, surpassing prior-aware counterparts.

Unraveling the Gradient Descent Dynamics of Transformers

26 September 2024·1273 words·6 mins· loading · loading

AI Theory Optimization 🏢 University of Minnesota, Twin Cities

This paper reveals how large embedding dimensions and appropriate initialization guarantee convergence in Transformer training, highlighting Gaussian attention’s superior landscape over Softmax.

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

26 September 2024·1717 words·9 mins· loading · loading

AI Theory Optimization 🏢 CISPA

Adaptive gradient methods using AdaGrad stepsizes achieve optimal convergence rates for convex composite optimization problems, handling inexact oracles, acceleration, and variance reduction without n…

Universal Online Convex Optimization with $1$ Projection per Round

26 September 2024·373 words·2 mins· loading · loading

Machine Learning Optimization 🏢 Nanjing University

This paper introduces a novel universal online convex optimization algorithm needing only one projection per round, achieving optimal regret bounds for various function types, including general convex…

Understanding the Gains from Repeated Self-Distillation

26 September 2024·2009 words·10 mins· loading · loading

Machine Learning Optimization 🏢 University of Washington

Repeated self-distillation significantly reduces excess risk in linear regression, achieving up to a ’d’ factor improvement over single-step methods.

Ultrafast classical phylogenetic method beats large protein language models on variant effect prediction

26 September 2024·2536 words·12 mins· loading · loading

AI Generated AI Theory Optimization 🏢 UC Berkeley

A revolutionary ultrafast phylogenetic method outperforms protein language models in variant effect prediction by efficiently estimating amino acid substitution rates from massive datasets.