UPDF AI

SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification

Xupeng Miao,G. Oliaro,11 Authors,Zhihao Jia

2023
9 Citations

TLDR

This paper introduces SpecInfer, an LLM serving system that accelerates generative LLM inference with speculative inference and token tree verification, which significantly reduces the end-to-end latency and computational requirement for serving generative LLMs while provably preserving model quality.