UPDF AI

This paper is included in the Proceedings of the 23rd USENIX Conference on File and Storage Technologies.

Weijian Chen,Shuibing He,6 Authors,Gang Chen

0 Citations

TLDR

IMPRESS is proposed, an importance-informed multi-tier prefix KV storage system to reduce I/O delay for LLM inference by only loading important prefix KVs, and introduces an I/O-efficient important KV identification algorithm to reduce TTFT during model inference.

Cited Papers
Citing Papers