KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation

Yulong Li; Bolin Ren; Ke Hu; Changyuan Liu; Zhengyong Jiang; Kang Dang; Jionglong Su

doi:10.1609/aaai.v39i27.35037

KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation

Yulong Li, Bolin Ren, Ke Hu, Changyuan Liu, Zhengyong Jiang, Kang Dang^*, Jionglong Su^*

^*Corresponding author for this work

School of AI and Advanced Computing

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Artificial intelligence has achieved notable results in sign language recognition and translation. However, relatively few efforts have been made to significantly improve the quality of life for the 72 million hearing-impaired people worldwide. Sign language translation models, relying on video inputs, involves with large parameter sizes, making it time-consuming and computationally intensive to be deployed. This directly contributes to the scarcity of human-centered technology in this field. Additionally, the lack of datasets in sign language translation hampers research progress in this area. To address these, we first propose a cross-modal multi-knowledge distillation technique from 3D to 1D and a novel end-to-end pre-training text correction framework. Compared to other pretrained models, our framework achieves significant advancements in correcting text output errors. Our model achieves a decrease in Word Error Rate (WER) of at least 1.4% on PHOENIX14 and PHOENIX14T datasets compared to the state-of-the-art CorrNet. Additionally, the TensorFlow Lite (TFLite) quantized model size is reduced to 12.93 MB, making it the smallest, fastest, and most accurate model to date. We have also collected and released extensive Chinese sign language datasets, and developed a specialized training vocabulary. To address the lack of research on data augmentation for landmark data, we have designed comparative experiments on various augmentation methods. Moreover, we performed a simulated deployment and prediction of our model on Intel platform CPUs and assessed the feasibility of deploying the model on other platforms.

Original language	English
Title of host publication	Special Track on AI Alignment
Editors	Toby Walsh, Julie Shah, Zico Kolter
Publisher	Association for the Advancement of Artificial Intelligence
Pages	28177-28185
Number of pages	9
Edition	27
ISBN (Electronic)	157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978
DOIs	https://doi.org/10.1609/aaai.v39i27.35037
Publication status	Published - 11 Apr 2025
Event	39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States Duration: 25 Feb 2025 → 4 Mar 2025

Publication series

Name	Proceedings of the AAAI Conference on Artificial Intelligence
Number	27
Volume	39
ISSN (Print)	2159-5399
ISSN (Electronic)	2374-3468

Conference

Conference	39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Country/Territory	United States
City	Philadelphia
Period	25/02/25 → 4/03/25

Access to Document

10.1609/aaai.v39i27.35037

Cite this

Li, Y., Ren, B., Hu, K., Liu, C., Jiang, Z., Dang, K., & Su, J. (2025). KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation. In T. Walsh, J. Shah, & Z. Kolter (Eds.), Special Track on AI Alignment (27 ed., pp. 28177-28185). (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 39, No. 27). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v39i27.35037

Li, Yulong ; Ren, Bolin ; Hu, Ke et al. / KD-MSLRT : Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation. Special Track on AI Alignment. editor / Toby Walsh ; Julie Shah ; Zico Kolter. 27. ed. Association for the Advancement of Artificial Intelligence, 2025. pp. 28177-28185 (Proceedings of the AAAI Conference on Artificial Intelligence; 27).

@inproceedings{7872b1077f0342a29abf346da702e153,

title = "KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation",

abstract = "Artificial intelligence has achieved notable results in sign language recognition and translation. However, relatively few efforts have been made to significantly improve the quality of life for the 72 million hearing-impaired people worldwide. Sign language translation models, relying on video inputs, involves with large parameter sizes, making it time-consuming and computationally intensive to be deployed. This directly contributes to the scarcity of human-centered technology in this field. Additionally, the lack of datasets in sign language translation hampers research progress in this area. To address these, we first propose a cross-modal multi-knowledge distillation technique from 3D to 1D and a novel end-to-end pre-training text correction framework. Compared to other pretrained models, our framework achieves significant advancements in correcting text output errors. Our model achieves a decrease in Word Error Rate (WER) of at least 1.4% on PHOENIX14 and PHOENIX14T datasets compared to the state-of-the-art CorrNet. Additionally, the TensorFlow Lite (TFLite) quantized model size is reduced to 12.93 MB, making it the smallest, fastest, and most accurate model to date. We have also collected and released extensive Chinese sign language datasets, and developed a specialized training vocabulary. To address the lack of research on data augmentation for landmark data, we have designed comparative experiments on various augmentation methods. Moreover, we performed a simulated deployment and prediction of our model on Intel platform CPUs and assessed the feasibility of deploying the model on other platforms.",

author = "Yulong Li and Bolin Ren and Ke Hu and Changyuan Liu and Zhengyong Jiang and Kang Dang and Jionglong Su",

note = "Publisher Copyright: Copyright {\textcopyright} 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 ; Conference date: 25-02-2025 Through 04-03-2025",

year = "2025",

month = apr,

day = "11",

doi = "10.1609/aaai.v39i27.35037",

language = "English",

series = "Proceedings of the AAAI Conference on Artificial Intelligence",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "27",

pages = "28177--28185",

editor = "Toby Walsh and Julie Shah and Zico Kolter",

booktitle = "Special Track on AI Alignment",

edition = "27",

}

Li, Y, Ren, B, Hu, K, Liu, C, Jiang, Z , Dang, K & Su, J 2025, KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation. in T Walsh, J Shah & Z Kolter (eds), Special Track on AI Alignment. 27 edn, Proceedings of the AAAI Conference on Artificial Intelligence, no. 27, vol. 39, Association for the Advancement of Artificial Intelligence, pp. 28177-28185, 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025, Philadelphia, United States, 25/02/25. https://doi.org/10.1609/aaai.v39i27.35037

KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation. / Li, Yulong; Ren, Bolin; Hu, Ke et al.
Special Track on AI Alignment. ed. / Toby Walsh; Julie Shah; Zico Kolter. 27. ed. Association for the Advancement of Artificial Intelligence, 2025. p. 28177-28185 (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 39, No. 27).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - KD-MSLRT

T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

AU - Li, Yulong

AU - Ren, Bolin

AU - Hu, Ke

AU - Liu, Changyuan

AU - Jiang, Zhengyong

AU - Dang, Kang

AU - Su, Jionglong

PY - 2025/4/11

Y1 - 2025/4/11

N2 - Artificial intelligence has achieved notable results in sign language recognition and translation. However, relatively few efforts have been made to significantly improve the quality of life for the 72 million hearing-impaired people worldwide. Sign language translation models, relying on video inputs, involves with large parameter sizes, making it time-consuming and computationally intensive to be deployed. This directly contributes to the scarcity of human-centered technology in this field. Additionally, the lack of datasets in sign language translation hampers research progress in this area. To address these, we first propose a cross-modal multi-knowledge distillation technique from 3D to 1D and a novel end-to-end pre-training text correction framework. Compared to other pretrained models, our framework achieves significant advancements in correcting text output errors. Our model achieves a decrease in Word Error Rate (WER) of at least 1.4% on PHOENIX14 and PHOENIX14T datasets compared to the state-of-the-art CorrNet. Additionally, the TensorFlow Lite (TFLite) quantized model size is reduced to 12.93 MB, making it the smallest, fastest, and most accurate model to date. We have also collected and released extensive Chinese sign language datasets, and developed a specialized training vocabulary. To address the lack of research on data augmentation for landmark data, we have designed comparative experiments on various augmentation methods. Moreover, we performed a simulated deployment and prediction of our model on Intel platform CPUs and assessed the feasibility of deploying the model on other platforms.

AB - Artificial intelligence has achieved notable results in sign language recognition and translation. However, relatively few efforts have been made to significantly improve the quality of life for the 72 million hearing-impaired people worldwide. Sign language translation models, relying on video inputs, involves with large parameter sizes, making it time-consuming and computationally intensive to be deployed. This directly contributes to the scarcity of human-centered technology in this field. Additionally, the lack of datasets in sign language translation hampers research progress in this area. To address these, we first propose a cross-modal multi-knowledge distillation technique from 3D to 1D and a novel end-to-end pre-training text correction framework. Compared to other pretrained models, our framework achieves significant advancements in correcting text output errors. Our model achieves a decrease in Word Error Rate (WER) of at least 1.4% on PHOENIX14 and PHOENIX14T datasets compared to the state-of-the-art CorrNet. Additionally, the TensorFlow Lite (TFLite) quantized model size is reduced to 12.93 MB, making it the smallest, fastest, and most accurate model to date. We have also collected and released extensive Chinese sign language datasets, and developed a specialized training vocabulary. To address the lack of research on data augmentation for landmark data, we have designed comparative experiments on various augmentation methods. Moreover, we performed a simulated deployment and prediction of our model on Intel platform CPUs and assessed the feasibility of deploying the model on other platforms.

UR - http://www.scopus.com/inward/record.url?scp=105003902515&partnerID=8YFLogxK

U2 - 10.1609/aaai.v39i27.35037

DO - 10.1609/aaai.v39i27.35037

M3 - Conference Proceeding

AN - SCOPUS:105003902515

T3 - Proceedings of the AAAI Conference on Artificial Intelligence

SP - 28177

EP - 28185

BT - Special Track on AI Alignment

A2 - Walsh, Toby

A2 - Shah, Julie

A2 - Kolter, Zico

PB - Association for the Advancement of Artificial Intelligence

Y2 - 25 February 2025 through 4 March 2025

ER -

Li Y, Ren B, Hu K, Liu C, Jiang Z , Dang K et al. KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation. In Walsh T, Shah J, Kolter Z, editors, Special Track on AI Alignment. 27 ed. Association for the Advancement of Artificial Intelligence. 2025. p. 28177-28185. (Proceedings of the AAAI Conference on Artificial Intelligence; 27). doi: 10.1609/aaai.v39i27.35037

KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this