ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Samyam Rajbhandari,Jeff Rasley,Olatunji Ruwase,Yuxiong He
2019 · DBLP: journals/corr/abs-1910-02054
arXiv.org · 620회 인용
TLDR
A novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, achieving both memory efficiency and scaling efficiency, and has the potential to scale beyond 1 Trillion parameters using today's hardware.
