🏢 Apple
STIV: Scalable Text and Image Conditioned Video Generation
·5285 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Apple
STIV: A novel, scalable method for text and image-conditioned video generation, systematically improving model architectures, training, and data curation for superior performance.
Cut Your Losses in Large-Vocabulary Language Models
·2958 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Apple
Cut Cross-Entropy (CCE) dramatically reduces the memory footprint of training large language models by cleverly computing the cross-entropy loss without materializing the full logit matrix.
Controlling Language and Diffusion Models by Transporting Activations
·11502 words·54 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Apple
Steering large language and diffusion models is made easy and efficient via Activation Transport (ACT)! This novel framework uses optimal transport theory to precisely control model activations, leadi…