OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning

Yunqi Gao, Zechao Zhang, Bing Hu*, A. Long Jin, Chunming Wu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The communication bottleneck has severely restricted the scalability of distributed deep learning. Tensor fusion improves the scalability of data parallelism by overlapping computation and communication tasks. However, existing tensor fusion schemes only result in suboptimal training performance. In this paper, we propose an efficient communication mechanism (OF-WFBP) to find the optimal tensor fusion scheme for synchronous data parallelism. We present the mathematical model of OF-WFBP and prove it is an NP-hard problem. We mathematically solve the mathematical model of OF-WFBP in two cases. We propose an improved sparrow search algorithm (GradSSA) to find the near-optimal tensor fusion scheme efficiently in other cases. Experimental results on two different GPU clusters show that OF-WFBP achieves up to 1.43x speedup compared to the state-of-the-art tensor fusion mechanisms.

Original languageEnglish
Article number103053
JournalParallel Computing
Volume118
DOIs
Publication statusPublished - Nov 2023
Externally publishedYes

Keywords

  • Data parallelism
  • Distributed deep learning
  • Tensor fusion

Fingerprint

Dive into the research topics of 'OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning'. Together they form a unique fingerprint.

Cite this