🏢 Princeton University
When Is Inductive Inference Possible?
·1470 words·7 mins·
loading
·
loading
AI Theory
Optimization
🏢 Princeton University
This paper provides a tight characterization of inductive inference, proving it’s possible if and only if the hypothesis class is a countable union of online learnable classes, resolving a long-standi…
Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem
·2181 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Princeton University
Vision-language models struggle with multi-object reasoning due to the binding problem; this paper reveals human-like capacity limits in VLMs and proposes solutions.
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
·426 words·2 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Princeton University
Researchers reveal how transformers learn word co-occurrence using a novel gradient flow analysis, uncovering a two-phase training process that leads to near-minimum loss and improved model performanc…
Tight Rates for Bandit Control Beyond Quadratics
·406 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 Princeton University
This paper presents an algorithm achieving Õ(√T) optimal regret for bandit non-stochastic control with strongly-convex and smooth cost functions, overcoming prior limitations of suboptimal bounds.
The Road Less Scheduled
·2275 words·11 mins·
loading
·
loading
Optimization
🏢 Princeton University
Revolutionizing machine learning, Schedule-Free optimization achieves state-of-the-art results without needing learning rate schedules, simplifying training and improving efficiency.
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
·10845 words·51 mins·
loading
·
loading
AI Applications
Security
🏢 Princeton University
SWE-agent achieves state-of-the-art performance on software engineering benchmarks by creating a custom agent-computer interface that enhances LM agents’ ability to use computers.
SureMap: Simultaneous mean estimation for single-task and multi-task disaggregated evaluation
·2443 words·12 mins·
loading
·
loading
AI Theory
Fairness
🏢 Princeton University
SureMap, a new method, significantly boosts accuracy in single and multi-task disaggregated evaluations of AI models using limited data by transforming the problem into Gaussian mean estimation and cl…
SimPO: Simple Preference Optimization with a Reference-Free Reward
·3091 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Princeton University
SimPO: a simpler, reference-free reward algorithm significantly outperforming existing offline preference optimization methods, achieving higher accuracy and efficiency in aligning LLMs with human pre…
Probabilistic Federated Prompt-Tuning with Non-IID and Imbalanced Data
·2066 words·10 mins·
loading
·
loading
Machine Learning
Federated Learning
🏢 Princeton University
Probabilistic Federated Prompt Tuning (PFPT) significantly improves federated learning accuracy on heterogeneous and imbalanced data by using a probabilistic model for prompt aggregation, outperformin…
Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift
·1968 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Transfer Learning
🏢 Princeton University
This paper introduces a novel method for creating highly accurate and narrow prediction intervals even when data distribution shifts unexpectedly, significantly improving machine learning model reliab…
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
·1344 words·7 mins·
loading
·
loading
AI Theory
Optimization
🏢 Princeton University
One-layer transformers provably learn the one-nearest neighbor prediction rule, offering theoretical insights into their in-context learning capabilities.
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
·455 words·3 mins·
loading
·
loading
AI Generated
AI Theory
Generalization
🏢 Princeton University
SGD can train neural networks to learn low-dimensional polynomials near the information-theoretic limit, surpassing previous correlational statistical query lower bounds.
Low-Rank Optimal Transport through Factor Relaxation with Latent Coupling
·2606 words·13 mins·
loading
·
loading
Machine Learning
Optimization
🏢 Princeton University
FRLC: a novel algorithm for low-rank optimal transport using latent coupling, enabling faster computation and better interpretability for diverse applications.
Learning Human-like Representations to Enable Learning Human Values
·2442 words·12 mins·
loading
·
loading
AI Theory
Representation Learning
🏢 Princeton University
Aligning AI’s world representation with humans enables faster, safer learning of human values, improving both exploration and generalization.
Learning and Transferring Sparse Contextual Bigrams with Linear Transformers
·1445 words·7 mins·
loading
·
loading
Natural Language Processing
Text Generation
🏢 Princeton University
Linear transformers efficiently learn sparse contextual bigrams by leveraging both in-context and global information, achieving polynomial sample complexity.
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
·2061 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Princeton University
Kraken: A new Transformer architecture boosts multi-device inference speed by 35.6% by cleverly overlapping communication with computation.
Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity
·2643 words·13 mins·
loading
·
loading
AI Applications
Robotics
🏢 Princeton University
Robots using LLMs for task planning often make unsafe or wrong decisions due to LLM hallucination and ambiguity in instructions. This paper introduces ‘introspective planning,’ a novel method that us…
Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference
·1693 words·8 mins·
loading
·
loading
AI Theory
Representation Learning
🏢 Princeton University
Contrastive learning enables efficient probabilistic inference in high-dimensional time series by creating Gaussian representations that form a Gauss-Markov chain, allowing for closed-form solutions t…
GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration
·1719 words·9 mins·
loading
·
loading
Large Language Models
🏢 Princeton University
GREATS: a novel online batch selection method significantly speeds up LLM training by greedily selecting high-quality data batches in every iteration, improving both convergence and generalization per…
Gradient Guidance for Diffusion Models: An Optimization Perspective
·2233 words·11 mins·
loading
·
loading
AI Theory
Optimization
🏢 Princeton University
This paper provides a novel optimization framework for guided diffusion models, proving Õ(1/K) convergence for concave objective functions and demonstrating structure-preserving guidance.