Skip to main content

🏢 Apple

Cut Your Losses in Large-Vocabulary Language Models
·2958 words·14 mins
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Apple
Cut Cross-Entropy (CCE) dramatically reduces the memory footprint of training large language models by cleverly computing the cross-entropy loss without materializing the full logit matrix.
Controlling Language and Diffusion Models by Transporting Activations
·11502 words·54 mins
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Apple
Steering large language and diffusion models is made easy and efficient via Activation Transport (ACT)! This novel framework uses optimal transport theory to precisely control model activations, leadi…