UPDF AI

An Efficient Technique for Large Mini-batch Challenge of DNNs Training on Large Scale Cluster

Akihiko Kasagi,Akihiro Tabuchi,6 Authors,Kohta Nakashima

2020 · DOI: 10.1145/3369583.3392687
IEEE International Symposium on High-Performance Parallel Distributed Computing · 0 Citations

TLDR

This paper introduces a novel technique, Final Polishing, which adjusts the means and variances in the batch normalization and mitigates the difference of normalization between validation datasets and augmented training datasets.

Abstract

Distributed deep learning using large mini-batches is a key strategy to perform the deep learning as fast as possible, but it represents a great challenge as it is difficult to achieve high scaling efficiency when using large clusters without compromising accuracy. The particular problem in this challenge is decreasing the number of model update iterations in whole of training. Thus, we need a technique which can converge the validation accuracy with a small number of iterations to address this challenge. In this paper, we introduce a novel technique, Final Polishing. This technique adjusts the means and variances in the batch normalization and mitigates the difference of normalization between validation datasets and augmented training datasets. By applying the technique, we achieved top-1 validation accuracy of 75.08% with mini-batch size of 81,920, with 2,048 GPUs and completed the training of ResNet-50 in 74.7 seconds.In addition, targeting top-1 validation accuracy of 75.9% or more, we tried additional parameters tuning. Then, we adjusted the number of GPUs and hyperparameters of DNNs with Final Polishing, and we also achieved top-1 validation accuracy of 75.97% with mini-batch size of 86,016, with 3,072 GPUs and completed the training of ResNet-50 in 62.1 seconds.

Cited Papers
Citing Papers