TY - JOUR
T1 - Fingerspelling Recognition by 12-Layer CNN with Stochastic Pooling
AU - Zhang, Yu Dong
AU - Jiang, Xianwei
AU - Wang, Shui Hua
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022
Y1 - 2022
N2 - Fingerspelling is a method of spelling words via hand movements. This study aims to propose a novel fingerspelling recognition system. We use 1320 fingerspelling images in our dataset. Our method is based on the convolutional neural network (CNN) model. We propose a 12-layer CNN as the backbone. Particularly, stochastic pooling (SP) is used to help solve the problems caused by max pooling or average pooling. In addition, an improved 20-way data augmentation method is proposed to circumvent overfitting. Our method is dubbed CNNSP. The results show that our CNNSP method achieved a micro-averaged F1 (MAF) score of 90.04 ± 0.82%. In contrast, the MAFs of l2-pooling, average pooling, and max pooling are 86.21 ± 1.12%, 87.54 ± 1.39%, and 89.07 ± 0.78%, respectively. Our CNNSP attains better results than eight state-of-the-art fingerspelling recognition methods. Besides, the SP is better than l2-pooling, average pooling, and max pooling.
AB - Fingerspelling is a method of spelling words via hand movements. This study aims to propose a novel fingerspelling recognition system. We use 1320 fingerspelling images in our dataset. Our method is based on the convolutional neural network (CNN) model. We propose a 12-layer CNN as the backbone. Particularly, stochastic pooling (SP) is used to help solve the problems caused by max pooling or average pooling. In addition, an improved 20-way data augmentation method is proposed to circumvent overfitting. Our method is dubbed CNNSP. The results show that our CNNSP method achieved a micro-averaged F1 (MAF) score of 90.04 ± 0.82%. In contrast, the MAFs of l2-pooling, average pooling, and max pooling are 86.21 ± 1.12%, 87.54 ± 1.39%, and 89.07 ± 0.78%, respectively. Our CNNSP attains better results than eight state-of-the-art fingerspelling recognition methods. Besides, the SP is better than l2-pooling, average pooling, and max pooling.
KW - Convolutional neural network
KW - Data augmentation
KW - Deep learning
KW - Fingerspelling recognition
KW - Stochastic pooling
UR - http://www.scopus.com/inward/record.url?scp=85124910845&partnerID=8YFLogxK
U2 - 10.1007/s11036-021-01900-8
DO - 10.1007/s11036-021-01900-8
M3 - Article
AN - SCOPUS:85124910845
SN - 1383-469X
JO - Mobile Networks and Applications
JF - Mobile Networks and Applications
ER -