Skip to main content

🏒 New York University

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
·2088 words·10 mins· loading · loading
Large Language Models 🏒 New York University
Unlocking tight generalization bounds for massive LLMs using a novel token-level approach.
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
·4869 words·23 mins· loading · loading
AI Generated Machine Learning Deep Learning 🏒 New York University
TuneTables optimizes PFNs for scalability via context optimization, achieving state-of-the-art performance on large tabular datasets while using fewer parameters and reducing inference time.
The Price of Implicit Bias in Adversarially Robust Generalization
·3000 words·15 mins· loading · loading
AI Generated AI Theory Robustness 🏒 New York University
Optimization’s implicit bias in robust machine learning hurts generalization; this work reveals how algorithm/architecture choices impact robustness, suggesting better optimization strategies are need…
Taming 'data-hungry' reinforcement learning? Stability in continuous state-action spaces
·358 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 New York University
Reinforcement learning achieves unprecedented fast convergence rates in continuous state-action spaces by leveraging novel stability properties of Markov Decision Processes.
Stochastic contextual bandits with graph feedback: from independence number to MAS number
·289 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏒 New York University
Contextual bandits with graph feedback achieve near-optimal regret by leveraging a novel graph-theoretic quantity that interpolates between independence and maximum acyclic subgraph numbers, depending…
Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
·2763 words·13 mins· loading · loading
AI Generated Machine Learning Deep Learning 🏒 New York University
Revolutionizing large neural networks, this paper introduces a continuous parameterization of structured matrices, discovering that full-rank structures without parameter sharing achieve optimal scali…
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
·2387 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏒 New York University
Overparameterized neural networks surprisingly recover from catastrophic interference when trained cyclically on repeated data sequences, exhibiting anticipatory knowledge reactivation.
Provable Posterior Sampling with Denoising Oracles via Tilted Transport
·1594 words·8 mins· loading · loading
Machine Learning Deep Learning 🏒 New York University
Boosting posterior sampling in challenging high-dimensional inverse problems, this paper introduces ’tilted transport’, a novel technique leveraging denoising oracles for provably easier sampling.
Preference Learning Algorithms Do Not Learn Preference Rankings
·2930 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏒 New York University
Despite common belief, state-of-the-art preference learning algorithms for LLMs achieve surprisingly low ranking accuracy, highlighting significant flaws in current alignment techniques.
Parametric model reduction of mean-field and stochastic systems via higher-order action matching
·2431 words·12 mins· loading · loading
AI Generated Machine Learning Deep Learning 🏒 New York University
HOAM learns reduced models of population dynamics for complex systems, enabling fast predictions across various physics parameters, outperforming state-of-the-art techniques.
Non-convolutional graph neural networks.
·2234 words·11 mins· loading · loading
Graph Neural Networks 🏒 New York University
RUM neural network, a novel non-convolutional GNN, overcomes limitations of conventional convolution-based models by using RNNs to merge topological and semantic features along random walks, achieving…
Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits
·495 words·3 mins· loading · loading
AI Generated AI Theory Optimization 🏒 New York University
Sparse navigable graphs enable efficient nearest neighbor search, but their construction and limits in high dimensions remain unclear. This paper presents an efficient method to construct navigable gr…
Multiview Scene Graph
·2365 words·12 mins· loading · loading
Computer Vision Scene Understanding 🏒 New York University
AI models struggle to understand 3D space like humans do. This paper introduces Multiview Scene Graphs (MSGs) – a new topological scene representation using interconnected place and object nodes buil…
Log-concave Sampling from a Convex Body with a Barrier: a Robust and Unified Dikin Walk
·1308 words·7 mins· loading · loading
AI Theory Optimization 🏒 New York University
This paper introduces robust Dikin walks for log-concave sampling, achieving faster mixing times and lower iteration costs than existing methods, particularly for high-dimensional settings.
Large Language Models Must Be Taught to Know What They Don’t Know
·3020 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏒 New York University
Teach LLMs uncertainty for reliable high-stakes predictions: Fine-tuning with graded examples significantly improves LLM’s uncertainty calibration and generalizes well.
Explanations that reveal all through the definition of encoding
·1891 words·9 mins· loading · loading
AI Theory Interpretability 🏒 New York University
New method, STRIPE-X, powerfully detects ’encoding’ in AI explanationsβ€”a sneaky phenomenon where explanations predict outcomes better than their constituent parts alone would suggest.
Equivariant spatio-hemispherical networks for diffusion MRI deconvolution
·2838 words·14 mins· loading · loading
AI Generated AI Applications Healthcare 🏒 New York University
Faster, more efficient deep learning for diffusion MRI deconvolution is achieved using spatio-hemispherical networks, improving fiber tractography.
Enhancing Domain Adaptation through Prompt Gradient Alignment
·2283 words·11 mins· loading · loading
Machine Learning Transfer Learning 🏒 New York University
Prompt Gradient Alignment (PGA) enhances unsupervised domain adaptation by aligning per-objective gradients in a multi-objective optimization framework, achieving state-of-the-art results.
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
·2471 words·12 mins· loading · loading
AI Applications Robotics 🏒 New York University
DynaMo: a novel self-supervised method significantly boosts visuo-motor control by learning in-domain dynamics from limited expert demonstrations, improving policy performance across various environme…
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
·1891 words·9 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏒 New York University
Symile: A simple model-agnostic approach for learning representations from unlimited modalities, outperforming pairwise CLIP by capturing higher-order information.