↓Skip to main content

🏢 MIT CSAIL

Theoretical Analysis of Weak-to-Strong Generalization

26 September 2024·1703 words·8 mins· loading · loading

AI Theory Generalization 🏢 MIT CSAIL

Strong student models can learn from weaker teachers, even correcting errors and generalizing beyond the teacher’s expertise. This paper provides new theoretical bounds explaining this ‘weak-to-strong…

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

26 September 2024·2727 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT CSAIL

Cross-Layer Attention (CLA) shrinks Transformer Key-Value cache 2x, improving LLMs’ memory efficiency without accuracy loss.

In-Context Symmetries: Self-Supervised Learning through Contextual World Models

26 September 2024·3570 words·17 mins· loading · loading

Computer Vision Self-Supervised Learning 🏢 MIT CSAIL

CONTEXTSSL: A novel self-supervised learning algorithm that adapts to task-specific symmetries by using context, achieving significant performance gains over existing methods.

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

26 September 2024·2643 words·13 mins· loading · loading

AI Applications Robotics 🏢 MIT CSAIL

Diffusion Forcing merges next-token prediction and full-sequence diffusion for superior sequence generation.

A Theoretical Understanding of Self-Correction through In-context Alignment

26 September 2024·1997 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT CSAIL

LLMs improve through self-correction, but the mechanisms are unclear. This paper provides a theoretical framework and empirical evidence demonstrating that self-correction arises from in-context align…