UPDF AI

Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers

Yangyang Feng,Minhui Xie,3 Authors,J. Shu

2023 · DOI: 10.1145/3575693.3575703
International Conference on Architectural Support for Programming Languages and Operating Systems · 27 Citations

TLDR

The key idea is a novel pipeline parallelism scheme enabling heterogeneous memory for large-scale model training, while bringing fewer communications than existing systems, to minimize communication contention.

Abstract

Fine-tuning on cheap commodity GPU servers makes large-scale deep learning models benefit more people. However, the low inter-GPU communication bandwidth and pressing communication contention on the commodity GPU server obstruct training efficiency. In this paper, we present Mobius, a communication-efficient system for fine tuning large-scale models on commodity GPU servers. The key idea is a novel pipeline parallelism scheme enabling heterogeneous memory for large-scale model training, while bringing fewer communications than existing systems. Mobius partitions the model into stages and carefully schedules them between GPU memory and DRAM to overlap communication with computation. It formulates pipeline execution into a mixed-integer program problem to find the optimal pipeline partition. It also features a new stage-to-GPU mapping method termed cross mapping, to minimize communication contention. Experiments on various scale models and GPU topologies show that Mobius significantly reduces the training time by 3.8-5.1× compared with the prior art.

Cited Papers
Citing Papers