Vision-based Human Detection by Fine-Tuned SSD Models

Tang Jin Cheng; Ahmad Fakhri Ab Nasir; Mohd Azraai Mohd Razman; Anwar P.P.Abdul Majeed; Thai Li Lim

doi:10.14569/IJACSA.2022.0131143

Vision-based Human Detection by Fine-Tuned SSD Models

Tang Jin Cheng, Ahmad Fakhri Ab Nasir, Mohd Azraai Mohd Razman, Anwar P.P.Abdul Majeed, Thai Li Lim

School of Robotics

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Human-robot interaction (HRI) and human-robot collaboration (HRC) has become more popular as the industries are taking initiative to idealize the era of automation and digitalization. Introduction of robots are often considered as a risk due to the fact that robots do not own the intelligent as human does. However, the literature that uses deep learning technologies as the base to improve HRI safety are limited, not to mention transfer learning approach. Hence, this study intended to empirically examine the efficacy of transfer learning approach in human detection task by fine-tuning the SSD models. A custom image dataset is developed by using the surveillance system in TT Vision Holdings Berhad and annotated accordingly. Thereafter, the dataset is partitioned into the train, validation, and test set by a ratio of 70:20:10. The learning behaviour of the models was monitored throughout the fine-tuning process via total loss graph. The result reveals that the SSD fine-tuned model with MobileNetV1 achieved 87.20% test AP, which is 6.1% higher than the SSD fine-tuned model with MobileNetV2. As a trade-off, the SSD fine-tuned model with MobileNetV1 attained 46.2 ms inference time on RTX 3070, which is 9.6 ms slower as compared to SSD fine-tuned model with MobileNetV2. Taking test AP as the key metric, SSD fine-tuned model with MobileNetV1 is considered as the best fine-tuned model in this study. In conclusion, it has shown that the transfer learning approach within the deep learning domain can help to protect human from the risk by detecting human at the first place.

Original language	English
Pages (from-to)	386-390
Number of pages	5
Journal	International Journal of Advanced Computer Science and Applications
Volume	13
Issue number	11
DOIs	https://doi.org/10.14569/IJACSA.2022.0131143
Publication status	Published - 2022

Keywords

Deep learning
Fine-tuning
Human detection
Human-robot interactions
Ssd
Transfer learning

Access to Document

10.14569/IJACSA.2022.0131143

Cite this

@article{79a91bed83cd463191867978d5357b12,

title = "Vision-based Human Detection by Fine-Tuned SSD Models",

abstract = "Human-robot interaction (HRI) and human-robot collaboration (HRC) has become more popular as the industries are taking initiative to idealize the era of automation and digitalization. Introduction of robots are often considered as a risk due to the fact that robots do not own the intelligent as human does. However, the literature that uses deep learning technologies as the base to improve HRI safety are limited, not to mention transfer learning approach. Hence, this study intended to empirically examine the efficacy of transfer learning approach in human detection task by fine-tuning the SSD models. A custom image dataset is developed by using the surveillance system in TT Vision Holdings Berhad and annotated accordingly. Thereafter, the dataset is partitioned into the train, validation, and test set by a ratio of 70:20:10. The learning behaviour of the models was monitored throughout the fine-tuning process via total loss graph. The result reveals that the SSD fine-tuned model with MobileNetV1 achieved 87.20% test AP, which is 6.1% higher than the SSD fine-tuned model with MobileNetV2. As a trade-off, the SSD fine-tuned model with MobileNetV1 attained 46.2 ms inference time on RTX 3070, which is 9.6 ms slower as compared to SSD fine-tuned model with MobileNetV2. Taking test AP as the key metric, SSD fine-tuned model with MobileNetV1 is considered as the best fine-tuned model in this study. In conclusion, it has shown that the transfer learning approach within the deep learning domain can help to protect human from the risk by detecting human at the first place.",

keywords = "Deep learning, Fine-tuning, Human detection, Human-robot interactions, Ssd, Transfer learning",

author = "Cheng, {Tang Jin} and Nasir, {Ahmad Fakhri Ab} and Razman, {Mohd Azraai Mohd} and Majeed, {Anwar P.P.Abdul} and Lim, {Thai Li}",

year = "2022",

doi = "10.14569/IJACSA.2022.0131143",

language = "English",

volume = "13",

pages = "386--390",

journal = "International Journal of Advanced Computer Science and Applications",

issn = "2158-107X",

number = "11",

}

TY - JOUR

T1 - Vision-based Human Detection by Fine-Tuned SSD Models

AU - Cheng, Tang Jin

AU - Nasir, Ahmad Fakhri Ab

AU - Razman, Mohd Azraai Mohd

AU - Majeed, Anwar P.P.Abdul

AU - Lim, Thai Li

PY - 2022

Y1 - 2022

N2 - Human-robot interaction (HRI) and human-robot collaboration (HRC) has become more popular as the industries are taking initiative to idealize the era of automation and digitalization. Introduction of robots are often considered as a risk due to the fact that robots do not own the intelligent as human does. However, the literature that uses deep learning technologies as the base to improve HRI safety are limited, not to mention transfer learning approach. Hence, this study intended to empirically examine the efficacy of transfer learning approach in human detection task by fine-tuning the SSD models. A custom image dataset is developed by using the surveillance system in TT Vision Holdings Berhad and annotated accordingly. Thereafter, the dataset is partitioned into the train, validation, and test set by a ratio of 70:20:10. The learning behaviour of the models was monitored throughout the fine-tuning process via total loss graph. The result reveals that the SSD fine-tuned model with MobileNetV1 achieved 87.20% test AP, which is 6.1% higher than the SSD fine-tuned model with MobileNetV2. As a trade-off, the SSD fine-tuned model with MobileNetV1 attained 46.2 ms inference time on RTX 3070, which is 9.6 ms slower as compared to SSD fine-tuned model with MobileNetV2. Taking test AP as the key metric, SSD fine-tuned model with MobileNetV1 is considered as the best fine-tuned model in this study. In conclusion, it has shown that the transfer learning approach within the deep learning domain can help to protect human from the risk by detecting human at the first place.

AB - Human-robot interaction (HRI) and human-robot collaboration (HRC) has become more popular as the industries are taking initiative to idealize the era of automation and digitalization. Introduction of robots are often considered as a risk due to the fact that robots do not own the intelligent as human does. However, the literature that uses deep learning technologies as the base to improve HRI safety are limited, not to mention transfer learning approach. Hence, this study intended to empirically examine the efficacy of transfer learning approach in human detection task by fine-tuning the SSD models. A custom image dataset is developed by using the surveillance system in TT Vision Holdings Berhad and annotated accordingly. Thereafter, the dataset is partitioned into the train, validation, and test set by a ratio of 70:20:10. The learning behaviour of the models was monitored throughout the fine-tuning process via total loss graph. The result reveals that the SSD fine-tuned model with MobileNetV1 achieved 87.20% test AP, which is 6.1% higher than the SSD fine-tuned model with MobileNetV2. As a trade-off, the SSD fine-tuned model with MobileNetV1 attained 46.2 ms inference time on RTX 3070, which is 9.6 ms slower as compared to SSD fine-tuned model with MobileNetV2. Taking test AP as the key metric, SSD fine-tuned model with MobileNetV1 is considered as the best fine-tuned model in this study. In conclusion, it has shown that the transfer learning approach within the deep learning domain can help to protect human from the risk by detecting human at the first place.

KW - Deep learning

KW - Fine-tuning

KW - Human detection

KW - Human-robot interactions

KW - Ssd

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85143868044&partnerID=8YFLogxK

U2 - 10.14569/IJACSA.2022.0131143

DO - 10.14569/IJACSA.2022.0131143

M3 - Article

AN - SCOPUS:85143868044

SN - 2158-107X

VL - 13

SP - 386

EP - 390

JO - International Journal of Advanced Computer Science and Applications

JF - International Journal of Advanced Computer Science and Applications

IS - 11

ER -

Vision-based Human Detection by Fine-Tuned SSD Models

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this