🏢 Princeton University

Global Convergence in Training Large-Scale Transformers

26 September 2024·398 words·2 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Princeton University

Large-scale Transformer training’s global convergence is proven using weight decay regularization and a refined mean-field analysis, bridging theory and practice.

FlexSBDD: Structure-Based Drug Design with Flexible Protein Modeling

26 September 2024·2072 words·10 mins· loading · loading

Machine Learning Deep Learning 🏢 Princeton University

FlexSBDD, a novel deep generative model, accurately predicts flexible protein-ligand complex structures, generating high-affinity drug molecules while overcoming the limitations of rigid protein model…

Finding Transformer Circuits With Edge Pruning

26 September 2024·2284 words·11 mins· loading · loading

Interpretability 🏢 Princeton University

Edge Pruning efficiently discovers sparse, yet accurate, computational subgraphs (circuits) in large language models via gradient-based edge pruning, advancing mechanistic interpretability research.

Disentangling the Roles of Distinct Cell Classes with Cell-Type Dynamical Systems

26 September 2024·2009 words·10 mins· loading · loading

🏢 Princeton University

New Cell-Type Dynamical Systems (CTDS) model disentangles neural population dynamics by incorporating distinct cell types, improving prediction accuracy and biological interpretability.

Can Models Learn Skill Composition from Examples?

26 September 2024·3161 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

Smaller language models can learn skill composition from limited examples, substantially improving their ability to combine skills in novel ways through fine-tuning.

Achieving Optimal Clustering in Gaussian Mixture Models with Anisotropic Covariance Structures

26 September 2024·1417 words·7 mins· loading · loading

Clustering 🏢 Princeton University

This research develops rate-optimal clustering algorithms for Gaussian Mixture Models with anisotropic covariance structures, bridging the gap between theoretical guarantees and practical efficiency.

A Theoretical Perspective for Speculative Decoding Algorithm

26 September 2024·1873 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

This paper theoretically analyzes speculative decoding, revealing its optimality and providing formulas for expected rejections, paving the way for more efficient large language model inference.

A Metalearned Neural Circuit for Nonparametric Bayesian Inference

26 September 2024·2042 words·10 mins· loading · loading

Machine Learning Meta Learning 🏢 Princeton University

Metalearning a neural circuit mimics nonparametric Bayesian inference, enabling fast, accurate, open-set classification.