🏢 New York University

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

26 September 2024·2088 words·10 mins· loading · loading

Large Language Models 🏢 New York University

Unlocking tight generalization bounds for massive LLMs using a novel token-level approach.

TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks

26 September 2024·4869 words·23 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 New York University

TuneTables optimizes PFNs for scalability via context optimization, achieving state-of-the-art performance on large tabular datasets while using fewer parameters and reducing inference time.

The Price of Implicit Bias in Adversarially Robust Generalization

26 September 2024·3000 words·15 mins· loading · loading

AI Generated AI Theory Robustness 🏢 New York University

Optimization’s implicit bias in robust machine learning hurts generalization; this work reveals how algorithm/architecture choices impact robustness, suggesting better optimization strategies are need…

Taming 'data-hungry' reinforcement learning? Stability in continuous state-action spaces

26 September 2024·358 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 New York University

Reinforcement learning achieves unprecedented fast convergence rates in continuous state-action spaces by leveraging novel stability properties of Markov Decision Processes.

Stochastic contextual bandits with graph feedback: from independence number to MAS number

26 September 2024·289 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 New York University

Contextual bandits with graph feedback achieve near-optimal regret by leveraging a novel graph-theoretic quantity that interpolates between independence and maximum acyclic subgraph numbers, depending…

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

26 September 2024·2763 words·13 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 New York University

Revolutionizing large neural networks, this paper introduces a continuous parameterization of structured matrices, discovering that full-rank structures without parameter sharing achieve optimal scali…

Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

26 September 2024·2387 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 New York University

Overparameterized neural networks surprisingly recover from catastrophic interference when trained cyclically on repeated data sequences, exhibiting anticipatory knowledge reactivation.

Provable Posterior Sampling with Denoising Oracles via Tilted Transport

26 September 2024·1594 words·8 mins· loading · loading

Machine Learning Deep Learning 🏢 New York University

Boosting posterior sampling in challenging high-dimensional inverse problems, this paper introduces ’tilted transport’, a novel technique leveraging denoising oracles for provably easier sampling.

Preference Learning Algorithms Do Not Learn Preference Rankings

26 September 2024·2930 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 New York University

Despite common belief, state-of-the-art preference learning algorithms for LLMs achieve surprisingly low ranking accuracy, highlighting significant flaws in current alignment techniques.

Parametric model reduction of mean-field and stochastic systems via higher-order action matching

26 September 2024·2431 words·12 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 New York University

HOAM learns reduced models of population dynamics for complex systems, enabling fast predictions across various physics parameters, outperforming state-of-the-art techniques.

Non-convolutional graph neural networks.

26 September 2024·2234 words·11 mins· loading · loading

Graph Neural Networks 🏢 New York University

RUM neural network, a novel non-convolutional GNN, overcomes limitations of conventional convolution-based models by using RNNs to merge topological and semantic features along random walks, achieving…

Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits

26 September 2024·495 words·3 mins· loading · loading

AI Generated AI Theory Optimization 🏢 New York University

Sparse navigable graphs enable efficient nearest neighbor search, but their construction and limits in high dimensions remain unclear. This paper presents an efficient method to construct navigable gr…

Multiview Scene Graph

26 September 2024·2365 words·12 mins· loading · loading

Computer Vision Scene Understanding 🏢 New York University

AI models struggle to understand 3D space like humans do. This paper introduces Multiview Scene Graphs (MSGs) – a new topological scene representation using interconnected place and object nodes buil…

Log-concave Sampling from a Convex Body with a Barrier: a Robust and Unified Dikin Walk

26 September 2024·1308 words·7 mins· loading · loading

AI Theory Optimization 🏢 New York University

This paper introduces robust Dikin walks for log-concave sampling, achieving faster mixing times and lower iteration costs than existing methods, particularly for high-dimensional settings.

Large Language Models Must Be Taught to Know What They Don’t Know

26 September 2024·3020 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 New York University

Teach LLMs uncertainty for reliable high-stakes predictions: Fine-tuning with graded examples significantly improves LLM’s uncertainty calibration and generalizes well.

Explanations that reveal all through the deﬁnition of encoding

26 September 2024·1891 words·9 mins· loading · loading

AI Theory Interpretability 🏢 New York University

New method, STRIPE-X, powerfully detects ’encoding’ in AI explanations—a sneaky phenomenon where explanations predict outcomes better than their constituent parts alone would suggest.

Equivariant spatio-hemispherical networks for diffusion MRI deconvolution

26 September 2024·2838 words·14 mins· loading · loading

AI Generated AI Applications Healthcare 🏢 New York University

Faster, more efficient deep learning for diffusion MRI deconvolution is achieved using spatio-hemispherical networks, improving fiber tractography.

Enhancing Domain Adaptation through Prompt Gradient Alignment

26 September 2024·2283 words·11 mins· loading · loading

Machine Learning Transfer Learning 🏢 New York University

Prompt Gradient Alignment (PGA) enhances unsupervised domain adaptation by aligning per-objective gradients in a multi-objective optimization framework, achieving state-of-the-art results.

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

26 September 2024·2471 words·12 mins· loading · loading

AI Applications Robotics 🏢 New York University

DynaMo: a novel self-supervised method significantly boosts visuo-motor control by learning in-domain dynamics from limited expert demonstrations, improving policy performance across various environme…

Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities

26 September 2024·1891 words·9 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 New York University

Symile: A simple model-agnostic approach for learning representations from unlimited modalities, outperforming pairwise CLIP by capturing higher-order information.