Nouveau chat
Historique de recherche
Recherche académiqueRecherche d'articlesBibliothèqueDiscussions récentes
SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
Xupeng Miao,G. Oliaro,11 Auteurs,Zhihao Jia
2023
9 citations
TLDR
This paper introduces SpecInfer, an LLM serving system that accelerates generative LLM inference with speculative inference and token tree verification, which significantly reduces the end-to-end latency and computational requirement for serving generative LLMs while provably preserving model quality.
