🏢 NVIDIA
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
·4300 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 NVIDIA
VLSI: Verbalized Layers-to-Interactions efficiently transfers knowledge from large to small VLMs using layer-wise natural language distillation, achieving significant performance gains without scaling…
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
·4724 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NVIDIA
Puzzle: a novel framework accelerates large language model inference by using neural architecture search and knowledge distillation, achieving a 2.17x speedup on a single GPU while preserving 98.4% ac…
Star Attention: Efficient LLM Inference over Long Sequences
·5535 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NVIDIA
Star Attention: 11x faster LLM inference on long sequences with 95-100% accuracy!
Hymba: A Hybrid-head Architecture for Small Language Models
·4219 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NVIDIA
Hymba: Hybrid-head architecture boosts small language model performance by 11.67x cache size reduction and 3.49x throughput, surpassing existing models.