Multi-Model Running Latency Optimization in an Edge Computing Paradigm

Peisong Li; Xinheng Wang; Kaizhu Huang; Yi Huang; Shancang Li; Muddesar Iqbal

doi:10.3390/s22166097

Multi-Model Running Latency Optimization in an Edge Computing Paradigm

Peisong Li, Xinheng Wang^*, Kaizhu Huang, Yi Huang, Shancang Li, Muddesar Iqbal

^*Corresponding author for this work

Department of Mechatronics and Robotics

Research output: Contribution to journal › Article › peer-review

19 Citations (Scopus)

Abstract

Recent advances in both lightweight deep learning algorithms and edge computing increasingly enable multiple model inference tasks to be conducted concurrently on resource-constrained edge devices, allowing us to achieve one goal collaboratively rather than getting high quality in each standalone task. However, the high overall running latency for performing multi-model inferences always negatively affects the real-time applications. To combat latency, the algorithms should be optimized to minimize the latency for multi-model deployment without compromising the safety-critical situation. This work focuses on the real-time task scheduling strategy for multi-model deployment and investigating the model inference using an open neural network exchange (ONNX) runtime engine. Then, an application deployment strategy is proposed based on the container technology and inference tasks are scheduled to different containers based on the scheduling strategies. Experimental results show that the proposed solution is able to significantly reduce the overall running latency in real-time applications.

Original language	English
Article number	6097
Journal	Sensors
Volume	22
Issue number	16
DOIs	https://doi.org/10.3390/s22166097
Publication status	Published - Aug 2022

Keywords

AI
autonomous driving
edge computing
latency optimization
multi-model
task scheduling

Access to Document

10.3390/s22166097

Cite this

@article{3ce3bc955f554ccfbe24b8f43eb3a9aa,

title = "Multi-Model Running Latency Optimization in an Edge Computing Paradigm",

abstract = "Recent advances in both lightweight deep learning algorithms and edge computing increasingly enable multiple model inference tasks to be conducted concurrently on resource-constrained edge devices, allowing us to achieve one goal collaboratively rather than getting high quality in each standalone task. However, the high overall running latency for performing multi-model inferences always negatively affects the real-time applications. To combat latency, the algorithms should be optimized to minimize the latency for multi-model deployment without compromising the safety-critical situation. This work focuses on the real-time task scheduling strategy for multi-model deployment and investigating the model inference using an open neural network exchange (ONNX) runtime engine. Then, an application deployment strategy is proposed based on the container technology and inference tasks are scheduled to different containers based on the scheduling strategies. Experimental results show that the proposed solution is able to significantly reduce the overall running latency in real-time applications.",

keywords = "AI, autonomous driving, edge computing, latency optimization, multi-model, task scheduling",

author = "Peisong Li and Xinheng Wang and Kaizhu Huang and Yi Huang and Shancang Li and Muddesar Iqbal",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors.",

year = "2022",

month = aug,

doi = "10.3390/s22166097",

language = "English",

volume = "22",

journal = "Sensors",

issn = "1424-8220",

publisher = "MDPI (Basel, Switzerland) ",

number = "16",

}

TY - JOUR

T1 - Multi-Model Running Latency Optimization in an Edge Computing Paradigm

AU - Li, Peisong

AU - Wang, Xinheng

AU - Huang, Kaizhu

AU - Huang, Yi

AU - Li, Shancang

AU - Iqbal, Muddesar

PY - 2022/8

Y1 - 2022/8

N2 - Recent advances in both lightweight deep learning algorithms and edge computing increasingly enable multiple model inference tasks to be conducted concurrently on resource-constrained edge devices, allowing us to achieve one goal collaboratively rather than getting high quality in each standalone task. However, the high overall running latency for performing multi-model inferences always negatively affects the real-time applications. To combat latency, the algorithms should be optimized to minimize the latency for multi-model deployment without compromising the safety-critical situation. This work focuses on the real-time task scheduling strategy for multi-model deployment and investigating the model inference using an open neural network exchange (ONNX) runtime engine. Then, an application deployment strategy is proposed based on the container technology and inference tasks are scheduled to different containers based on the scheduling strategies. Experimental results show that the proposed solution is able to significantly reduce the overall running latency in real-time applications.

AB - Recent advances in both lightweight deep learning algorithms and edge computing increasingly enable multiple model inference tasks to be conducted concurrently on resource-constrained edge devices, allowing us to achieve one goal collaboratively rather than getting high quality in each standalone task. However, the high overall running latency for performing multi-model inferences always negatively affects the real-time applications. To combat latency, the algorithms should be optimized to minimize the latency for multi-model deployment without compromising the safety-critical situation. This work focuses on the real-time task scheduling strategy for multi-model deployment and investigating the model inference using an open neural network exchange (ONNX) runtime engine. Then, an application deployment strategy is proposed based on the container technology and inference tasks are scheduled to different containers based on the scheduling strategies. Experimental results show that the proposed solution is able to significantly reduce the overall running latency in real-time applications.

KW - AI

KW - autonomous driving

KW - edge computing

KW - latency optimization

KW - multi-model

KW - task scheduling

UR - http://www.scopus.com/inward/record.url?scp=85136632055&partnerID=8YFLogxK

U2 - 10.3390/s22166097

DO - 10.3390/s22166097

M3 - Article

C2 - 36015856

AN - SCOPUS:85136632055

SN - 1424-8220

VL - 22

JO - Sensors

JF - Sensors

IS - 16

M1 - 6097

ER -

Multi-Model Running Latency Optimization in an Edge Computing Paradigm

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this