🏢 Princeton University
Global Convergence in Training Large-Scale Transformers
·398 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 Princeton University
Large-scale Transformer training’s global convergence is proven using weight decay regularization and a refined mean-field analysis, bridging theory and practice.
FlexSBDD: Structure-Based Drug Design with Flexible Protein Modeling
·2072 words·10 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 Princeton University
FlexSBDD, a novel deep generative model, accurately predicts flexible protein-ligand complex structures, generating high-affinity drug molecules while overcoming the limitations of rigid protein model…
Finding Transformer Circuits With Edge Pruning
·2284 words·11 mins·
loading
·
loading
Interpretability
🏢 Princeton University
Edge Pruning efficiently discovers sparse, yet accurate, computational subgraphs (circuits) in large language models via gradient-based edge pruning, advancing mechanistic interpretability research.
Disentangling the Roles of Distinct Cell Classes with Cell-Type Dynamical Systems
·2009 words·10 mins·
loading
·
loading
🏢 Princeton University
New Cell-Type Dynamical Systems (CTDS) model disentangles neural population dynamics by incorporating distinct cell types, improving prediction accuracy and biological interpretability.
Can Models Learn Skill Composition from Examples?
·3161 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Princeton University
Smaller language models can learn skill composition from limited examples, substantially improving their ability to combine skills in novel ways through fine-tuning.
Achieving Optimal Clustering in Gaussian Mixture Models with Anisotropic Covariance Structures
·1417 words·7 mins·
loading
·
loading
Clustering
🏢 Princeton University
This research develops rate-optimal clustering algorithms for Gaussian Mixture Models with anisotropic covariance structures, bridging the gap between theoretical guarantees and practical efficiency.
A Theoretical Perspective for Speculative Decoding Algorithm
·1873 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Princeton University
This paper theoretically analyzes speculative decoding, revealing its optimality and providing formulas for expected rejections, paving the way for more efficient large language model inference.
A Metalearned Neural Circuit for Nonparametric Bayesian Inference
·2042 words·10 mins·
loading
·
loading
Machine Learning
Meta Learning
🏢 Princeton University
Metalearning a neural circuit mimics nonparametric Bayesian inference, enabling fast, accurate, open-set classification.