Skip to main content

🏢 Colfax Research

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
·2517 words·12 mins· loading · loading
Large Language Models 🏢 Colfax Research
FlashAttention-3: Achieves 1.5-2x faster attention on H100 GPUs using asynchrony and low-precision, reaching 1.3 PFLOPs/s.