Skip to main content

Large Language Models

A teacher-teacher framework for clinical language representation learning
·1643 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Harvard University
A lightweight knowledge alignment module enables two pre-trained LLMs to mutually learn and improve clinical language representation, exceeding individual model performance on various downstream tasks…
A Polar coordinate system represents syntax in large language models
·1633 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Meta AI
LLMs spontaneously encode syntax using a polar coordinate system, representing syntactic relations via relative direction and distance of word embeddings.
A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention
·2455 words·12 mins· loading · loading
Large Language Models 🏢 EPFL, Lausanne, Switzerland
A solvable model reveals a phase transition in dot-product attention, showing how semantic attention emerges from positional attention with increased data, explaining the qualitative improvements in l…
A distributional simplicity bias in the learning dynamics of transformers
·2474 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 International School for Advanced Studies
Transformers learn increasingly complex language patterns sequentially, starting with simpler interactions before mastering higher-order ones.
A Critical Evaluation of AI Feedback for Aligning Large Language Models
·2724 words·13 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Stanford University
Contrary to popular belief, simple supervised fine-tuning with strong language models outperforms complex reinforcement learning in aligning large language models, significantly improving efficiency.
3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability
·2315 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Language Technology Lab, University of Amsterdam
RoAd: a novel parameter-efficient finetuning method uses 2D rotation to adapt LLMs, enabling efficient batching, composability, and improved interpretability.
$eta$-DPO: Direct Preference Optimization with Dynamic $eta$
·2106 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Alibaba Group
β-DPO dynamically adjusts a key parameter in Direct Preference Optimization, significantly improving LLM alignment with human preferences.
$ extit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning
·3529 words·17 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 MIT-IBM Watson AI Lab
Trans-LoRA enables near data-free transfer of fine-tuned LLMs across models!
$ extit{Read-ME}$: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
·2049 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Texas at Austin
Read-ME refactors pre-trained dense LLMs into efficient, router-decoupled Mixture-of-Experts (MoEs) via activation sparsity, achieving up to 10.1% improvement on MMLU and 6.1% reduction in latency.