UPDF AI

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Ji Lin,Jiaming Tang,3 Authors,Song Han

2023 · DOI: 10.48550/arXiv.2306.00978
arXiv.org · 521 Citations

TLDR

Activation-aware Weight Quantization (AWQ) is proposed, a hardware-friendly approach for LLM low-bit weight-only quantization that can well preserve LLMs’ generalization ability on different domains and modalities, without overfitting to the calibration set.