NetPlacer+: Model Parallelism Based on Load Balance in Distributed Deep Learning

Yunqi Gao, Bing Hu*, Mahdi Boloursaz Mashhadi, A-Long Jin, Pei Xiao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The importance of Model Parallelism in Distributed Deep Learning continues to grow due to the increase in the Deep Neural Network (DNN) scale and the demand for higher training speed. Different from all the existing works, we propose a model-parallel strategy called NetPlacer+ based on load balance. The major idea in NetPlacer+ is to partition the DNN model into multiple devices by balancing each device's computation and communication load. We build the mathematical model of NetPlacer+. We transform the mathematical model of NetPlacer+ and obtain its approximate optimal solution using the interior point method. Extensive experiments in two GPU clusters and eight modern DNNs are conducted to verify the effectiveness of NetPlacer+. Experimental results show that the model-parallel strategy of NetPlacer+ achieves up to 1.25x speedup compared to NVIDIA's DLPlacer.

Cite this