🏢 University of Tokyo
Wide Two-Layer Networks can Learn from Adversarial Perturbations
·2045 words·10 mins·
loading
·
loading
AI Theory
Robustness
🏢 University of Tokyo
Wide two-layer neural networks can generalize well from mislabeled adversarial examples because adversarial perturbations surprisingly contain sufficient class-specific features.
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
·1822 words·9 mins·
loading
·
loading
AI Generated
Computer Vision
Vision Transformers
🏢 University of Tokyo
Vision Transformers (ViTs) generalize surprisingly well, even when overfitting training data; this work provides the first theoretical explanation by characterizing the optimization dynamics of ViTs a…
Understanding the Expressivity and Trainability of Fourier Neural Operator: A Mean-Field Perspective
·2537 words·12 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 University of Tokyo
A mean-field theory explains Fourier Neural Operator (FNO) behavior, linking expressivity to trainability by identifying ordered and chaotic phases that correspond to vanishing or exploding gradients,…
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
·3071 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Tokyo
Linear probing then fine-tuning (LP-FT) significantly improves language model fine-tuning; this paper uses Neural Tangent Kernel (NTK) theory to explain why.
Transformers are Minimax Optimal Nonparametric In-Context Learners
·1461 words·7 mins·
loading
·
loading
AI Generated
Machine Learning
Meta Learning
🏢 University of Tokyo
Transformers excel at in-context learning by leveraging minimax-optimal nonparametric learning, achieving near-optimal risk with sufficient pretraining data diversity.
Taming the Long Tail in Human Mobility Prediction
·2047 words·10 mins·
loading
·
loading
AI Applications
Smart Cities
🏢 University of Tokyo
LoTNext framework tackles human mobility prediction’s long-tail problem by using graph and loss adjustments to improve the accuracy of predicting less-visited locations.
Risk-sensitive control as inference with Rényi divergence
·1494 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 University of Tokyo
Risk-sensitive control is recast as inference using Rényi divergence, yielding new algorithms and revealing equivalences between seemingly disparate methods.
On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice
·360 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 University of Tokyo
This paper provides novel algorithms and matching lower bounds for multi-armed bandits with expert advice and contextual linear bandits, resolving open questions and advancing theoretical understandin…
Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation
·2032 words·10 mins·
loading
·
loading
AI Applications
Smart Cities
🏢 University of Tokyo
LLM agents effectively generate realistic personal mobility patterns using semantically rich data.
Integrating GNN and Neural ODEs for Estimating Non-Reciprocal Two-Body Interactions in Mixed-Species Collective Motion
·1573 words·8 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 University of Tokyo
Deep learning framework integrating GNNs and neural ODEs precisely estimates non-reciprocal two-body interactions in mixed-species collective motion, accurately replicating both individual and collect…
Geometric-Averaged Preference Optimization for Soft Preference Labels
·2987 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Tokyo
Improving LLM alignment, this paper introduces soft preference labels & geometric averaging in Direct Preference Optimization, consistently improving performance on standard benchmarks.
Generalization Bound and Learning Methods for Data-Driven Projections in Linear Programming
·1748 words·9 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 University of Tokyo
Learn to project, solve faster! This paper introduces data-driven projections for solving high-dimensional linear programs, proving theoretical guarantees and demonstrating significant improvements in…
Generalizable and Animatable Gaussian Head Avatar
·3445 words·17 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 University of Tokyo
One-shot animatable head avatar reconstruction is achieved using a novel dual-lifting method that generates 3D Gaussians from a single image, enabling real-time expression control and rendering with s…
Fast Rates in Stochastic Online Convex Optimization by Exploiting the Curvature of Feasible Sets
·1343 words·7 mins·
loading
·
loading
AI Theory
Optimization
🏢 University of Tokyo
This paper introduces a novel approach for fast rates in online convex optimization by exploiting the curvature of feasible sets, achieving logarithmic regret bounds under specific conditions.
Enriching Disentanglement: From Logical Definitions to Quantitative Metrics
·3435 words·17 mins·
loading
·
loading
AI Theory
Representation Learning
🏢 University of Tokyo
This paper presents a novel approach to deriving theoretically grounded disentanglement metrics by linking logical definitions to quantitative measures, offering strong theoretical guarantees and easi…
Dealing with Synthetic Data Contamination in Online Continual Learning
·2977 words·14 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 University of Tokyo
AI-generated images contaminate online continual learning datasets, hindering performance. A new method, ESRM, leverages entropy and real/synthetic similarity maximization to select high-quality data…
Continuous Temporal Domain Generalization
·2639 words·13 mins·
loading
·
loading
AI Generated
Machine Learning
Domain Generalization
🏢 University of Tokyo
Koodos: a novel Koopman operator-driven framework that tackles Continuous Temporal Domain Generalization (CTDG) by modeling continuous data dynamics and learning model evolution across irregular time …
ADOPT: Modified Adam Can Converge with Any $eta_2$ with the Optimal Rate
·1889 words·9 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 University of Tokyo
ADOPT, a novel adaptive gradient method, achieves optimal convergence rates without restrictive assumptions, unlike Adam, significantly improving deep learning optimization.
A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of Θ(T^{2/3}) and its Application to Best-of-Both-Worlds
·334 words·2 mins·
loading
·
loading
AI Theory
Optimization
🏢 University of Tokyo
A new adaptive learning rate for FTRL achieves minimax regret of O(T²/³) in online learning, improving existing best-of-both-worlds algorithms for various hard problems.
A provable control of sensitivity of neural networks through a direct parameterization of the overall bi-Lipschitzness
·4589 words·22 mins·
loading
·
loading
AI Theory
Optimization
🏢 University of Tokyo
New framework directly controls neural network sensitivity by precisely parameterizing overall bi-Lipschitzness, offering improved robustness and generalization.