SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
Xupeng Miao,G. Oliaro,11 作者,Zhihao Jia
2023
引用 9 次
TLDR
This paper introduces SpecInfer, an LLM serving system that accelerates generative LLM inference with speculative inference and token tree verification, which significantly reduces the end-to-end latency and computational requirement for serving generative LLMs while provably preserving model quality.
