Skip to main content

🏢 NVIDIA

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
·4300 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 NVIDIA
VLSI: Verbalized Layers-to-Interactions efficiently transfers knowledge from large to small VLMs using layer-wise natural language distillation, achieving significant performance gains without scaling…
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
·4724 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA
Puzzle: a novel framework accelerates large language model inference by using neural architecture search and knowledge distillation, achieving a 2.17x speedup on a single GPU while preserving 98.4% ac…
Star Attention: Efficient LLM Inference over Long Sequences
·5535 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA
Star Attention: 11x faster LLM inference on long sequences with 95-100% accuracy!
Hymba: A Hybrid-head Architecture for Small Language Models
·4219 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA
Hymba: Hybrid-head architecture boosts small language model performance by 11.67x cache size reduction and 3.49x throughput, surpassing existing models.