transformers . zip : Compressing Transformers with Pruning and Quantization
transformers . zip : Compressing Transformers with Pruning and Quantization
Robin Cheong
2019
33 Citations
TLDR
This work is the first to apply quantization methods to the Transformer architecture and thefirst to compare quantization and pruning on the Trans transformer architecture, and finds the proposed quantization method is both significantly faster and gives equal or better performance at the same compression level.
