AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

TLDR

Activation-aware Weight Quantization (AWQ) is proposed, a hardware-friendly approach for LLM low-bit weight-only quantization that can well preserve LLMs’ generalization ability on different domains and modalities, without overfitting to the calibration set.