↓Skip to main content

🏢 Colfax Research

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

26 September 2024·2517 words·12 mins· loading · loading

Large Language Models 🏢 Colfax Research

FlashAttention-3: Achieves 1.5-2x faster attention on H100 GPUs using asynchrony and low-precision, reaching 1.3 PFLOPs/s.