🏢 Apple
Cut Your Losses in Large-Vocabulary Language Models
·2958 words·14 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Apple
Cut Cross-Entropy (CCE) dramatically reduces the memory footprint of training large language models by cleverly computing the cross-entropy loss without materializing the full logit matrix.
Controlling Language and Diffusion Models by Transporting Activations
·11502 words·54 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Apple
Steering large language and diffusion models is made easy and efficient via Activation Transport (ACT)! This novel framework uses optimal transport theory to precisely control model activations, leadi…