🏢 MIT

When does perceptual alignment benefit vision representations?

26 September 2024·4058 words·20 mins· loading · loading

AI Generated Computer Vision Representation Learning 🏢 MIT

Aligning vision models to human perceptual similarity judgments significantly boosts performance in diverse vision tasks like counting and segmentation, but surprisingly reduces performance in natural…

Understanding the Role of Equivariance in Self-supervised Learning

26 September 2024·2016 words·10 mins· loading · loading

AI Generated Machine Learning Self-Supervised Learning 🏢 MIT

E-SSL’s generalization ability is rigorously analyzed via an information-theoretic lens, revealing key design principles for improved performance.

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

26 September 2024·3501 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT

Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

26 September 2024·2500 words·12 mins· loading · loading

AI Theory Optimization 🏢 MIT

Breaking neural network parameter symmetries leads to faster training, better generalization, and improved loss landscape behavior, as demonstrated by novel asymmetric network architectures.

STL: Still Tricky Logic (for System Validation, Even When Showing Your Work)

26 September 2024·1760 words·9 mins· loading · loading

AI Applications Robotics 🏢 MIT

Human understanding of formal specifications for robot validation is surprisingly poor; active learning, while improving engagement, doesn’t significantly boost accuracy.

Statistical-Computational Trade-offs for Density Estimation

26 September 2024·433 words·3 mins· loading · loading

AI Theory Optimization 🏢 MIT

Density estimation algorithms face inherent trade-offs: reducing sample needs often increases query time. This paper proves these trade-offs are fundamental, showing limits to how much improvement is…

Solving Minimum-Cost Reach Avoid using Reinforcement Learning

26 September 2024·2253 words·11 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 MIT

RC-PPO: Reinforcement learning solves minimum-cost reach-avoid problems with up to 57% lower costs!

Semi-Random Matrix Completion via Flow-Based Adaptive Reweighting

26 September 2024·349 words·2 mins· loading · loading

AI Theory Optimization 🏢 MIT

New nearly-linear time algorithm achieves high-accuracy semi-random matrix completion, overcoming previous limitations on accuracy and noise tolerance.

Score Distillation via Reparametrized DDIM

26 September 2024·4128 words·20 mins· loading · loading

Computer Vision Image Generation 🏢 MIT

Researchers improved 3D shape generation from 2D diffusion models by showing that existing Score Distillation Sampling is a reparameterized version of DDIM and fixing its high-variance noise issue via…

Scalable Optimization in the Modular Norm

26 September 2024·2001 words·10 mins· loading · loading

Machine Learning Deep Learning 🏢 MIT

Deep learning optimization gets a major upgrade with Modula, a new method that uses the modular norm to normalize weight updates, enabling learning rate transfer across network widths and depths, thus…

Rethinking the Capacity of Graph Neural Networks for Branching Strategy

26 September 2024·1678 words·8 mins· loading · loading

AI Generated AI Theory Optimization 🏢 MIT

This paper proves that higher-order GNNs can universally approximate strong branching in MILP solvers, whereas simpler GNNs can only accurately approximate for a specific class of problems.

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

26 September 2024·3333 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT

QuanTA: Quantum-inspired Tensor Adaptation efficiently fine-tunes LLMs with high-rank updates, surpassing low-rank methods like LoRA for complex tasks while minimizing additional parameters.

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

26 September 2024·1639 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT

DeltaNet, a linear transformer boosting associative recall, now trains efficiently via a novel algorithm, scaling to large language models and outperforming existing linear baselines.

Oracle-Efficient Differentially Private Learning with Public Data

26 September 2024·293 words·2 mins· loading · loading

AI Theory Privacy 🏢 MIT

This paper introduces computationally efficient algorithms for differentially private learning by leveraging public data, overcoming previous computational limitations and enabling broader practical a…

Online Control in Population Dynamics

26 September 2024·1672 words·8 mins· loading · loading

AI Applications Healthcare 🏢 MIT

This paper introduces a novel, robust online control framework for managing evolving populations, achieving near-optimal control even in complex, noisy systems.

On the Role of Attention Masks and LayerNorm in Transformers

26 September 2024·2522 words·12 mins· loading · loading

AI Generated AI Theory Representation Learning 🏢 MIT

Transformers’ self-attention mechanism, while powerful, suffers from rank collapse with increasing depth. This paper reveals that while masked attention still leads to exponential collapse, sparse att…

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

26 September 2024·1661 words·8 mins· loading · loading

AI Generated AI Theory Optimization 🏢 MIT

Researchers discover Dilated Entropy is the optimal distance-generating function for solving extensive-form games using first-order methods, achieving near-optimal regret bounds.

Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

26 September 2024·592 words·3 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 MIT

LOLIPOP: A novel algorithm achieving near-optimal regret for offline contextual Markov Decision Processes (CMDPs) using only O(H log T) offline density estimation oracle calls.

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

26 September 2024·2170 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT

OccamLLM: LLMs now perform accurate arithmetic in a single step!

Nuclear Norm Regularization for Deep Learning

26 September 2024·1763 words·9 mins· loading · loading

Machine Learning Deep Learning 🏢 MIT

This paper presents a novel, efficient method for Jacobian nuclear norm regularization in deep learning, replacing computationally expensive SVDs with equivalent Frobenius norm computations, thereby e…