Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention
Matteo Pagliardini,Daniele Paliotta,Martin Jaggi,F. Fleuret
2023 · DBLP: conf/nips/PagliardiniPJF23
Neural Information Processing Systems · 15 Citations
TLDR
This work extends FlashAttention to accommodate a large class of attention sparsity patterns that, in particular, encompass key/query dropping and hashing-based attention, leading to implementations with no computational complexity overhead and a multi-fold runtime speedup on top of FlashAttention.
Cited Papers
Citing Papers
