UPDF AI

Sparsifying Transformer Models with Differentiable Representation Pooling

Michal Pietruszka,Łukasz Borchmann,Filip Grali'nski

2020 · DBLP: journals/corr/abs-2009-05169
arXiv.org · 3 Citations

TLDR

A novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations, thus leveraging the model's information bottleneck with twofold strength.