This paper is included in the Proceedings of the 23rd USENIX Conference on File and Storage Technologies.
Weijian Chen,Shuibing He,6 Authors,Gang Chen
0 Citations
TLDR
IMPRESS is proposed, an importance-informed multi-tier prefix KV storage system to reduce I/O delay for LLM inference by only loading important prefix KVs, and introduces an I/O-efficient important KV identification algorithm to reduce TTFT during model inference.
Cited Papers
Citing Papers
