TY - JOUR
T1 - Toward Dynamic Resource Allocation and Client Scheduling in Hierarchical Federated Learning
T2 - A Two-Phase Deep Reinforcement Learning Approach
AU - Chen, Xiaojing
AU - Li, Zhenyuan
AU - Ni, Wei
AU - Wang, Xin
AU - Zhang, Shunqing
AU - Sun, Yanzan
AU - Xu, Shugong
AU - Pei, Qingqi
N1 - Publisher Copyright:
© 1972-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic policy gradient (DDPG) framework, referred to as "TP-DDPG", to balance online the learning delay and model accuracy of an FL process in an energy harvesting-powered HFL system. The key idea is that we divide optimization decisions into two groups, and employ DDPG to learn one group in the first phase, while interpreting the other group as part of the environment to provide rewards for training the DDPG in the second phase. Specifically, the DDPG learns the selection of participating clients, and their CPU configurations and the transmission powers. A new straggler-aware client association and bandwidth allocation (SCABA) algorithm efficiently optimizes the other decisions and evaluates the reward for the DDPG. Experiments demonstrate that with substantially reduced number of learnable parameters, the TP-DDPG can quickly converge to effective polices that can shorten the training time of HFL by 39.4% compared to its benchmarks, when the required test accuracy of HFL is 0.9.
AB - Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic policy gradient (DDPG) framework, referred to as "TP-DDPG", to balance online the learning delay and model accuracy of an FL process in an energy harvesting-powered HFL system. The key idea is that we divide optimization decisions into two groups, and employ DDPG to learn one group in the first phase, while interpreting the other group as part of the environment to provide rewards for training the DDPG in the second phase. Specifically, the DDPG learns the selection of participating clients, and their CPU configurations and the transmission powers. A new straggler-aware client association and bandwidth allocation (SCABA) algorithm efficiently optimizes the other decisions and evaluates the reward for the DDPG. Experiments demonstrate that with substantially reduced number of learnable parameters, the TP-DDPG can quickly converge to effective polices that can shorten the training time of HFL by 39.4% compared to its benchmarks, when the required test accuracy of HFL is 0.9.
KW - client scheduling
KW - deep deterministic policy gradient
KW - Hierarchical federated learning
KW - resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85197494974&partnerID=8YFLogxK
U2 - 10.1109/TCOMM.2024.3420733
DO - 10.1109/TCOMM.2024.3420733
M3 - Article
AN - SCOPUS:85197494974
SN - 0090-6778
VL - 72
SP - 7798
EP - 7813
JO - IEEE Transactions on Communications
JF - IEEE Transactions on Communications
IS - 12
ER -