NetPlacer+: Model Parallelism Based on Load Balance in Distributed Deep Learning

Yunqi Gao; Bing Hu; Mahdi Boloursaz Mashhadi; A-Long Jin; Pei Xiao

doi:10.1109/TETCI.2025.3543765

NetPlacer+: Model Parallelism Based on Load Balance in Distributed Deep Learning

Yunqi Gao, Bing Hu^*, Mahdi Boloursaz Mashhadi, A-Long Jin, Pei Xiao

^*Corresponding author for this work

Department of Communications and Networking

Research output: Contribution to journal › Article › peer-review

Abstract

The importance of Model Parallelism in Distributed Deep Learning continues to grow due to the increase in the Deep Neural Network (DNN) scale and the demand for higher training speed. Different from all the existing works, we propose a model-parallel strategy called NetPlacer+ based on load balance. The major idea in NetPlacer+ is to partition the DNN model into multiple devices by balancing each device's computation and communication load. We build the mathematical model of NetPlacer+. We transform the mathematical model of NetPlacer+ and obtain its approximate optimal solution using the interior point method. Extensive experiments in two GPU clusters and eight modern DNNs are conducted to verify the effectiveness of NetPlacer+. Experimental results show that the model-parallel strategy of NetPlacer+ achieves up to 1.25x speedup compared to NVIDIA's DLPlacer.

Original language	English
Journal	IEEE Transactions on Emerging Topics in Computational Intelligence
DOIs	https://doi.org/10.1109/TETCI.2025.3543765
Publication status	Published - 2025

Access to Document

10.1109/TETCI.2025.3543765

Cite this

@article{bbca847ff1c640e8a7d767a799dbdc35,

title = "NetPlacer+: Model Parallelism Based on Load Balance in Distributed Deep Learning",

abstract = "The importance of Model Parallelism in Distributed Deep Learning continues to grow due to the increase in the Deep Neural Network (DNN) scale and the demand for higher training speed. Different from all the existing works, we propose a model-parallel strategy called NetPlacer+ based on load balance. The major idea in NetPlacer+ is to partition the DNN model into multiple devices by balancing each device's computation and communication load. We build the mathematical model of NetPlacer+. We transform the mathematical model of NetPlacer+ and obtain its approximate optimal solution using the interior point method. Extensive experiments in two GPU clusters and eight modern DNNs are conducted to verify the effectiveness of NetPlacer+. Experimental results show that the model-parallel strategy of NetPlacer+ achieves up to 1.25x speedup compared to NVIDIA's DLPlacer.",

author = "Yunqi Gao and Bing Hu and Mashhadi, {Mahdi Boloursaz} and A-Long Jin and Pei Xiao",

year = "2025",

doi = "10.1109/TETCI.2025.3543765",

language = "English",

journal = "IEEE Transactions on Emerging Topics in Computational Intelligence",

issn = "2471-285X",

}

TY - JOUR

T1 - NetPlacer+: Model Parallelism Based on Load Balance in Distributed Deep Learning

AU - Gao, Yunqi

AU - Hu, Bing

AU - Mashhadi, Mahdi Boloursaz

AU - Jin, A-Long

AU - Xiao, Pei

PY - 2025

Y1 - 2025

N2 - The importance of Model Parallelism in Distributed Deep Learning continues to grow due to the increase in the Deep Neural Network (DNN) scale and the demand for higher training speed. Different from all the existing works, we propose a model-parallel strategy called NetPlacer+ based on load balance. The major idea in NetPlacer+ is to partition the DNN model into multiple devices by balancing each device's computation and communication load. We build the mathematical model of NetPlacer+. We transform the mathematical model of NetPlacer+ and obtain its approximate optimal solution using the interior point method. Extensive experiments in two GPU clusters and eight modern DNNs are conducted to verify the effectiveness of NetPlacer+. Experimental results show that the model-parallel strategy of NetPlacer+ achieves up to 1.25x speedup compared to NVIDIA's DLPlacer.

AB - The importance of Model Parallelism in Distributed Deep Learning continues to grow due to the increase in the Deep Neural Network (DNN) scale and the demand for higher training speed. Different from all the existing works, we propose a model-parallel strategy called NetPlacer+ based on load balance. The major idea in NetPlacer+ is to partition the DNN model into multiple devices by balancing each device's computation and communication load. We build the mathematical model of NetPlacer+. We transform the mathematical model of NetPlacer+ and obtain its approximate optimal solution using the interior point method. Extensive experiments in two GPU clusters and eight modern DNNs are conducted to verify the effectiveness of NetPlacer+. Experimental results show that the model-parallel strategy of NetPlacer+ achieves up to 1.25x speedup compared to NVIDIA's DLPlacer.

U2 - 10.1109/TETCI.2025.3543765

DO - 10.1109/TETCI.2025.3543765

M3 - Article

SN - 2471-285X

JO - IEEE Transactions on Emerging Topics in Computational Intelligence

JF - IEEE Transactions on Emerging Topics in Computational Intelligence

ER -

NetPlacer+: Model Parallelism Based on Load Balance in Distributed Deep Learning

Abstract

Access to Document

Fingerprint

Cite this