UPDF AI

transformers . zip : Compressing Transformers with Pruning and Quantization

Robin Cheong

2019
33 Citations

TLDR

This work is the first to apply quantization methods to the Transformer architecture and thefirst to compare quantization and pruning on the Trans transformer architecture, and finds the proposed quantization method is both significantly faster and gives equal or better performance at the same compression level.