TY - JOUR
T1 - TBDB
T2 - Token Bucket-Based Dynamic Batching for Resource Scheduling Supporting Neural Network Inference in Intelligent Consumer Electronics
AU - Gao, Honghao
AU - Qiu, Binyang
AU - Wang, Ye
AU - Yu, Si
AU - Xu, Yueshen
AU - Wang, Xinheng
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2024/2/1
Y1 - 2024/2/1
N2 - Consumer electronics such as mobile phones, wearable devices, and vehicle electronics use many intelligent applications such as voice commands, machine translation, and face recognition. These applications require large inference workloads to perform intelligent tasks, which are often completed using deep neural network (DNN) models. Traditional approaches rely on pure cloud computing, with consumer devices collecting data and cloud computing platforms completing inference tasks. In real life, the workloads of these applications are not fixed and are likely to exhibit fluctuations or unexpected surges, increasing the workload of cloud computing platforms. Simply increasing server resources often leads to resource waste. Therefore, a dynamic resource scheduling method is needed. In this paper, a token bucket-based dynamic batching (TBDB) algorithm that maintains throughput while reducing latency and increasing device utilization, especially for large volumes of requests, is proposed. Our work includes the following achievements: 1) We employ the token bucket algorithm to determine the workload, considering the concurrency and frequency of the data. We dynamically vary the maximum batch size (MBS) that will trigger the inference process for the next batch. 2) A low-coupling mode architecture that can be embedded into various consumer electronics in a plug-and-play manner is designed. 3) The performance of the electronic devices and the maximum latency are studied to provide guidance for setting hyperparameters. Finally, we evaluate the effectiveness of our method in three consumer electronic scenarios and present a theoretical analysis for setting hyperparameters in different scenarios.
AB - Consumer electronics such as mobile phones, wearable devices, and vehicle electronics use many intelligent applications such as voice commands, machine translation, and face recognition. These applications require large inference workloads to perform intelligent tasks, which are often completed using deep neural network (DNN) models. Traditional approaches rely on pure cloud computing, with consumer devices collecting data and cloud computing platforms completing inference tasks. In real life, the workloads of these applications are not fixed and are likely to exhibit fluctuations or unexpected surges, increasing the workload of cloud computing platforms. Simply increasing server resources often leads to resource waste. Therefore, a dynamic resource scheduling method is needed. In this paper, a token bucket-based dynamic batching (TBDB) algorithm that maintains throughput while reducing latency and increasing device utilization, especially for large volumes of requests, is proposed. Our work includes the following achievements: 1) We employ the token bucket algorithm to determine the workload, considering the concurrency and frequency of the data. We dynamically vary the maximum batch size (MBS) that will trigger the inference process for the next batch. 2) A low-coupling mode architecture that can be embedded into various consumer electronics in a plug-and-play manner is designed. 3) The performance of the electronic devices and the maximum latency are studied to provide guidance for setting hyperparameters. Finally, we evaluate the effectiveness of our method in three consumer electronic scenarios and present a theoretical analysis for setting hyperparameters in different scenarios.
KW - Consumer electronics
KW - dynamic batching
KW - inference task
KW - neural network
KW - token bucket
KW - workload balance
UR - http://www.scopus.com/inward/record.url?scp=85179789051&partnerID=8YFLogxK
U2 - 10.1109/TCE.2023.3339633
DO - 10.1109/TCE.2023.3339633
M3 - Article
AN - SCOPUS:85179789051
SN - 0098-3063
VL - 70
SP - 1134
EP - 1144
JO - IEEE TRANSACTIONS ON CONSUMER ELECTRONICS
JF - IEEE TRANSACTIONS ON CONSUMER ELECTRONICS
IS - 1
ER -