🏢 MIT
When does perceptual alignment benefit vision representations?
·4058 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
Representation Learning
🏢 MIT
Aligning vision models to human perceptual similarity judgments significantly boosts performance in diverse vision tasks like counting and segmentation, but surprisingly reduces performance in natural…
Understanding the Role of Equivariance in Self-supervised Learning
·2016 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Self-Supervised Learning
🏢 MIT
E-SSL’s generalization ability is rigorously analyzed via an information-theoretic lens, revealing key design principles for improved performance.
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
·3501 words·17 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 MIT
Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
·2500 words·12 mins·
loading
·
loading
AI Theory
Optimization
🏢 MIT
Breaking neural network parameter symmetries leads to faster training, better generalization, and improved loss landscape behavior, as demonstrated by novel asymmetric network architectures.
STL: Still Tricky Logic (for System Validation, Even When Showing Your Work)
·1760 words·9 mins·
loading
·
loading
AI Applications
Robotics
🏢 MIT
Human understanding of formal specifications for robot validation is surprisingly poor; active learning, while improving engagement, doesn’t significantly boost accuracy.
Statistical-Computational Trade-offs for Density Estimation
·433 words·3 mins·
loading
·
loading
AI Theory
Optimization
🏢 MIT
Density estimation algorithms face inherent trade-offs: reducing sample needs often increases query time. This paper proves these trade-offs are fundamental, showing limits to how much improvement is…
Solving Minimum-Cost Reach Avoid using Reinforcement Learning
·2253 words·11 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 MIT
RC-PPO: Reinforcement learning solves minimum-cost reach-avoid problems with up to 57% lower costs!
Semi-Random Matrix Completion via Flow-Based Adaptive Reweighting
·349 words·2 mins·
loading
·
loading
AI Theory
Optimization
🏢 MIT
New nearly-linear time algorithm achieves high-accuracy semi-random matrix completion, overcoming previous limitations on accuracy and noise tolerance.
Score Distillation via Reparametrized DDIM
·4128 words·20 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 MIT
Researchers improved 3D shape generation from 2D diffusion models by showing that existing Score Distillation Sampling is a reparameterized version of DDIM and fixing its high-variance noise issue via…
Scalable Optimization in the Modular Norm
·2001 words·10 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 MIT
Deep learning optimization gets a major upgrade with Modula, a new method that uses the modular norm to normalize weight updates, enabling learning rate transfer across network widths and depths, thus…
Rethinking the Capacity of Graph Neural Networks for Branching Strategy
·1678 words·8 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 MIT
This paper proves that higher-order GNNs can universally approximate strong branching in MILP solvers, whereas simpler GNNs can only accurately approximate for a specific class of problems.
QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation
·3333 words·16 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 MIT
QuanTA: Quantum-inspired Tensor Adaptation efficiently fine-tunes LLMs with high-rank updates, surpassing low-rank methods like LoRA for complex tasks while minimizing additional parameters.
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
·1639 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 MIT
DeltaNet, a linear transformer boosting associative recall, now trains efficiently via a novel algorithm, scaling to large language models and outperforming existing linear baselines.
Oracle-Efficient Differentially Private Learning with Public Data
·293 words·2 mins·
loading
·
loading
AI Theory
Privacy
🏢 MIT
This paper introduces computationally efficient algorithms for differentially private learning by leveraging public data, overcoming previous computational limitations and enabling broader practical a…
Online Control in Population Dynamics
·1672 words·8 mins·
loading
·
loading
AI Applications
Healthcare
🏢 MIT
This paper introduces a novel, robust online control framework for managing evolving populations, achieving near-optimal control even in complex, noisy systems.
On the Role of Attention Masks and LayerNorm in Transformers
·2522 words·12 mins·
loading
·
loading
AI Generated
AI Theory
Representation Learning
🏢 MIT
Transformers’ self-attention mechanism, while powerful, suffers from rank collapse with increasing depth. This paper reveals that while masked attention still leads to exponential collapse, sparse att…
On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games
·1661 words·8 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 MIT
Researchers discover Dilated Entropy is the optimal distance-generating function for solving extensive-form games using first-order methods, achieving near-optimal regret bounds.
Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff
·592 words·3 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 MIT
LOLIPOP: A novel algorithm achieving near-optimal regret for offline contextual Markov Decision Processes (CMDPs) using only O(H log T) offline density estimation oracle calls.
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
·2170 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 MIT
OccamLLM: LLMs now perform accurate arithmetic in a single step!
Nuclear Norm Regularization for Deep Learning
·1763 words·9 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 MIT
This paper presents a novel, efficient method for Jacobian nuclear norm regularization in deep learning, replacing computationally expensive SVDs with equivalent Frobenius norm computations, thereby e…