Mobile Robot Tracking with Deep Learning Models under the Specific Environments

Tongpo Zhang; Yunze Song; Zejian Kong; Tiantian Guo; Miguel Lopez-Benitez; Enggee Lim; Fei Ma; Limin Yu

doi:10.3390/app13010273

Mobile Robot Tracking with Deep Learning Models under the Specific Environments

Tongpo Zhang, Yunze Song, Zejian Kong, Tiantian Guo, Miguel Lopez-Benitez, Enggee Lim, Fei Ma, Limin Yu^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

Visual-based target tracking is one of the critical methodologies for the control problem of multi-robot systems. In dynamic mobile environments, it is common to lose the tracking targets due to partial visual occlusion. Technologies based on deep learning (DL) provide a natural solution to this problem. DL-based methods require less human intervention and fine-tuning. The framework has flexibility to be retrained with customized data sets. It can handle massive amounts of available video data in the target tracking system. This paper discusses the challenges of robot tracking under partial occlusion and compares the system performance of recent DL models used for tracking, namely you-only-look-once (YOLO-v5), Faster region proposal network (R-CNN) and single shot multibox detector (SSD). A series of experiments are committed to helping solve specific industrial problems. Four data sets are that cover various occlusion statuses are generated. Performance metrics of F1 score, precision, recall, and training time are analyzed under different application scenarios and parameter settings. Based on the metrics mentioned above, a comparative metric P is devised to further compare the overall performance of the three DL models. The SSD model obtained the highest P score, which was 13.34 times that of the Faster RCNN model and was 3.39 times that of the YOLOv5 model with the designed testing data set 1. The SSD model obtained the highest P scores, which was 11.77 times that of the Faster RCNN model and was 2.43 times that of the YOLOv5 model with the designed testing data set 2. The analysis reveals different characteristics of the three DL models. Recommendations are made to help future researchers to select the most suitable DL model and apply it properly in a system design.

Original language	English
Article number	273
Journal	Applied Sciences (Switzerland)
Volume	13
Issue number	1
DOIs	https://doi.org/10.3390/app13010273
Publication status	Published - Jan 2023

Keywords

computer vision
deep learning (DL)
robot tracking

Access to Document

10.3390/app13010273

Cite this

@article{c76c6f2a387a45a79715b4b29f13c5b4,

title = "Mobile Robot Tracking with Deep Learning Models under the Specific Environments",

abstract = "Visual-based target tracking is one of the critical methodologies for the control problem of multi-robot systems. In dynamic mobile environments, it is common to lose the tracking targets due to partial visual occlusion. Technologies based on deep learning (DL) provide a natural solution to this problem. DL-based methods require less human intervention and fine-tuning. The framework has flexibility to be retrained with customized data sets. It can handle massive amounts of available video data in the target tracking system. This paper discusses the challenges of robot tracking under partial occlusion and compares the system performance of recent DL models used for tracking, namely you-only-look-once (YOLO-v5), Faster region proposal network (R-CNN) and single shot multibox detector (SSD). A series of experiments are committed to helping solve specific industrial problems. Four data sets are that cover various occlusion statuses are generated. Performance metrics of F1 score, precision, recall, and training time are analyzed under different application scenarios and parameter settings. Based on the metrics mentioned above, a comparative metric P is devised to further compare the overall performance of the three DL models. The SSD model obtained the highest P score, which was 13.34 times that of the Faster RCNN model and was 3.39 times that of the YOLOv5 model with the designed testing data set 1. The SSD model obtained the highest P scores, which was 11.77 times that of the Faster RCNN model and was 2.43 times that of the YOLOv5 model with the designed testing data set 2. The analysis reveals different characteristics of the three DL models. Recommendations are made to help future researchers to select the most suitable DL model and apply it properly in a system design.",

keywords = "computer vision, deep learning (DL), robot tracking",

author = "Tongpo Zhang and Yunze Song and Zejian Kong and Tiantian Guo and Miguel Lopez-Benitez and Enggee Lim and Fei Ma and Limin Yu",

note = "Funding Information: This research was partially funded by the Research Enhancement Fund of XJTLU (REF-19-01-04), the National Natural Science Foundation of China (NSFC) (Grant No. 61501380), and by AI University Research Center (AI-URC) and XJTLU Laboratory for Intelligent Computation and Financial Technology through XJTLU Key Programme Special Fund (KSF-P-02), Jiangsu Data Science and Cognitive Computational Engineering Research Centre, and ARIES Research Centre and Suzhou Key Lab of Broadband wireless Access Technology (BWAT). Publisher Copyright: {\textcopyright} 2022 by the authors.",

year = "2023",

month = jan,

doi = "10.3390/app13010273",

language = "English",

volume = "13",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

number = "1",

}

TY - JOUR

T1 - Mobile Robot Tracking with Deep Learning Models under the Specific Environments

AU - Zhang, Tongpo

AU - Song, Yunze

AU - Kong, Zejian

AU - Guo, Tiantian

AU - Lopez-Benitez, Miguel

AU - Lim, Enggee

AU - Ma, Fei

AU - Yu, Limin

N1 - Funding Information: This research was partially funded by the Research Enhancement Fund of XJTLU (REF-19-01-04), the National Natural Science Foundation of China (NSFC) (Grant No. 61501380), and by AI University Research Center (AI-URC) and XJTLU Laboratory for Intelligent Computation and Financial Technology through XJTLU Key Programme Special Fund (KSF-P-02), Jiangsu Data Science and Cognitive Computational Engineering Research Centre, and ARIES Research Centre and Suzhou Key Lab of Broadband wireless Access Technology (BWAT). Publisher Copyright: © 2022 by the authors.

PY - 2023/1

Y1 - 2023/1

N2 - Visual-based target tracking is one of the critical methodologies for the control problem of multi-robot systems. In dynamic mobile environments, it is common to lose the tracking targets due to partial visual occlusion. Technologies based on deep learning (DL) provide a natural solution to this problem. DL-based methods require less human intervention and fine-tuning. The framework has flexibility to be retrained with customized data sets. It can handle massive amounts of available video data in the target tracking system. This paper discusses the challenges of robot tracking under partial occlusion and compares the system performance of recent DL models used for tracking, namely you-only-look-once (YOLO-v5), Faster region proposal network (R-CNN) and single shot multibox detector (SSD). A series of experiments are committed to helping solve specific industrial problems. Four data sets are that cover various occlusion statuses are generated. Performance metrics of F1 score, precision, recall, and training time are analyzed under different application scenarios and parameter settings. Based on the metrics mentioned above, a comparative metric P is devised to further compare the overall performance of the three DL models. The SSD model obtained the highest P score, which was 13.34 times that of the Faster RCNN model and was 3.39 times that of the YOLOv5 model with the designed testing data set 1. The SSD model obtained the highest P scores, which was 11.77 times that of the Faster RCNN model and was 2.43 times that of the YOLOv5 model with the designed testing data set 2. The analysis reveals different characteristics of the three DL models. Recommendations are made to help future researchers to select the most suitable DL model and apply it properly in a system design.

AB - Visual-based target tracking is one of the critical methodologies for the control problem of multi-robot systems. In dynamic mobile environments, it is common to lose the tracking targets due to partial visual occlusion. Technologies based on deep learning (DL) provide a natural solution to this problem. DL-based methods require less human intervention and fine-tuning. The framework has flexibility to be retrained with customized data sets. It can handle massive amounts of available video data in the target tracking system. This paper discusses the challenges of robot tracking under partial occlusion and compares the system performance of recent DL models used for tracking, namely you-only-look-once (YOLO-v5), Faster region proposal network (R-CNN) and single shot multibox detector (SSD). A series of experiments are committed to helping solve specific industrial problems. Four data sets are that cover various occlusion statuses are generated. Performance metrics of F1 score, precision, recall, and training time are analyzed under different application scenarios and parameter settings. Based on the metrics mentioned above, a comparative metric P is devised to further compare the overall performance of the three DL models. The SSD model obtained the highest P score, which was 13.34 times that of the Faster RCNN model and was 3.39 times that of the YOLOv5 model with the designed testing data set 1. The SSD model obtained the highest P scores, which was 11.77 times that of the Faster RCNN model and was 2.43 times that of the YOLOv5 model with the designed testing data set 2. The analysis reveals different characteristics of the three DL models. Recommendations are made to help future researchers to select the most suitable DL model and apply it properly in a system design.

KW - computer vision

KW - deep learning (DL)

KW - robot tracking

UR - http://www.scopus.com/inward/record.url?scp=85146014449&partnerID=8YFLogxK

U2 - 10.3390/app13010273

DO - 10.3390/app13010273

M3 - Article

AN - SCOPUS:85146014449

SN - 2076-3417

VL - 13

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 1

M1 - 273

ER -

Mobile Robot Tracking with Deep Learning Models under the Specific Environments

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this