UPDF AI

Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention

Matteo Pagliardini,Daniele Paliotta,Martin Jaggi,F. Fleuret

2023 · DBLP: conf/nips/PagliardiniPJF23
Neural Information Processing Systems · 15 Citations

TLDR

This work extends FlashAttention to accommodate a large class of attention sparsity patterns that, in particular, encompass key/query dropping and hashing-based attention, leading to implementations with no computational complexity overhead and a multi-fold runtime speedup on top of FlashAttention.

Cited Papers
Citing Papers