SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
Xupeng Miao,G. Oliaro,11 저자,Zhihao Jia
2023
9회 인용
TLDR
This paper introduces SpecInfer, an LLM serving system that accelerates generative LLM inference with speculative inference and token tree verification, which significantly reduces the end-to-end latency and computational requirement for serving generative LLMs while provably preserving model quality.
