Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions
Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions
Kumar Shridhar,Alessandro Stolfo,Mrinmaya Sachan
2022 · DOI: 10.48550/arXiv.2212.00193
arXiv.org · 47회 인용
TLDR
A knowledge distillation approach, that leverages the step-by-step CoT reasoning capabilities of larger models and distils these reasoning abilities into smaller models and boosts the performance of GPT-2 variants up to 35% when distilled with this approach compared to CoT.
