LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration
Zhiwen Mo,Lei Wang,8 Authors,Mao Yang
2024 · DOI: 10.48550/arXiv.2408.06003
arXiv.org · 7 Citations
TLDR
LUT T ENSOR C ORE is introduced, a software-hardware co-design optimized for low-bit LLM inference that introduces software-based operator fusion and table symmetrization techniques to optimize table precompute and table storage and designs an end-to-end compilation stack with new instructions for LUT-based mpGEMM, enabling efficient LLM compilation and optimizations.
Cited Papers
Citing Papers
