🏢 NVIDIA

FFN Fusion: Rethinking Sequential Computation in Large Language Models

24 March 2025·3776 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

FFN Fusion: Parallelizing sequential computation in large language models for significant speedups!

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

18 March 2025·4257 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA

Cosmos-Transfer1: An adaptable conditional world generation model using multimodal control.

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

18 March 2025·4040 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 NVIDIA

Cosmos-Reason1: Physical AI models that reason and act in the real world, bridging the gap between perception and embodied decision-making.

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

12 March 2025·3137 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA

SANA-Sprint: An efficient diffusion model for ultra-fast text-to-image generation with continuous-time consistency distillation, achieving state-of-the-art performance in speed and quality.

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

3 March 2025·2982 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 NVIDIA

DIFIX3D+ improves 3D reconstructions by reducing artifacts via single-step diffusion models, enhancing novel-view synthesis quality and consistency.

One-step Diffusion Models with $f$-Divergence Distribution Matching

21 February 2025·6126 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 NVIDIA

f-distill: One-step diffusion models through f-divergence minimization, outperforming reverse-KL with better mode coverage and lower variance.

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

14 February 2025·6984 words·33 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Autonomous Vehicles 🏢 NVIDIA

V2V-LLM leverages multi-modal LLMs for safer cooperative autonomous driving by fusing perception data from multiple vehicles, answering driving-related questions, and improving trajectory planning.

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

2 December 2024·4300 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 NVIDIA

VLSI: Verbalized Layers-to-Interactions efficiently transfers knowledge from large to small VLMs using layer-wise natural language distillation, achieving significant performance gains without scaling…

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

28 November 2024·4724 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

Puzzle: a novel framework accelerates large language model inference by using neural architecture search and knowledge distillation, achieving a 2.17x speedup on a single GPU while preserving 98.4% ac…

Star Attention: Efficient LLM Inference over Long Sequences

26 November 2024·5535 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

Star Attention: 11x faster LLM inference on long sequences with 95-100% accuracy!

Hymba: A Hybrid-head Architecture for Small Language Models

20 November 2024·4219 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

Hymba: Hybrid-head architecture boosts small language model performance by 11.67x cache size reduction and 3.49x throughput, surpassing existing models.