🏢 Colfax Research
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
·2517 words·12 mins·
loading
·
loading
Large Language Models
🏢 Colfax Research
FlashAttention-3: Achieves 1.5-2x faster attention on H100 GPUs using asynchrony and low-precision, reaching 1.3 PFLOPs/s.