AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Ji Lin,Jiaming Tang,3 Authors,Song Han
2023 · DOI: 10.48550/arXiv.2306.00978
arXiv.org · 521 Citations
TLDR
Activation-aware Weight Quantization (AWQ) is proposed, a hardware-friendly approach for LLM low-bit weight-only quantization that can well preserve LLMs’ generalization ability on different domains and modalities, without overfitting to the calibration set.
