TY - JOUR
T1 - SVD-KD
T2 - SVD-based hidden layer feature extraction for Knowledge distillation
AU - Zhang, Jianhua
AU - Gao, Yi
AU - Zhou, Mian
AU - Liu, Ruyu
AU - Cheng, Xu
AU - Nikolić, Saša V.
AU - Chen, Shengyong
N1 - Publisher Copyright:
© 2025
PY - 2025/11
Y1 - 2025/11
N2 - Recent advancement of Knowledge distillation (KD) is to extract and transfer middle-layer knowledge of teacher models to student models, which is better than original KDs which only transfer the last layer of knowledge. However, the middle-layer knowledge commonly appears as a high-dimensional tensor, which is more difficult to transfer than the one-dimensional knowledge in the last layer. Moreover, when there are significant differences between the teachers and students in terms of model parameter capabilities and model structures, the middle layers of teacher and student models differ in dimensionality and structure, which further increase the learning difficulty for student models. To solve these problems, we propose a novel knowledge extraction module to transform the high-dimensional tensor tensor-based knowledge in middle layers to one-dimensional knowledge based on singular value decomposition. Thus, the knowledge at the middle layers of teacher models can be effectively extracted and simplified, and extremely facilitate the learning of student models, even though the structure and parameter capacity between the teacher network and the student network are very different. To help the students learn the knowledge from the middle layers of teachers as accurately as possible, we also propose a novel loss function that can constrain values of the one-dimension knowledge learned by the student model to be as close as possible to that extracted from the teacher model, thus improving the learning efficiency of the student model. We have conducted extensive experiments on three major datasets (CIFAR-10, CIFAR-100, and ImageNet 1K), and the results of our method demonstrate superior performance by comparing with the state-of-the-art methods.
AB - Recent advancement of Knowledge distillation (KD) is to extract and transfer middle-layer knowledge of teacher models to student models, which is better than original KDs which only transfer the last layer of knowledge. However, the middle-layer knowledge commonly appears as a high-dimensional tensor, which is more difficult to transfer than the one-dimensional knowledge in the last layer. Moreover, when there are significant differences between the teachers and students in terms of model parameter capabilities and model structures, the middle layers of teacher and student models differ in dimensionality and structure, which further increase the learning difficulty for student models. To solve these problems, we propose a novel knowledge extraction module to transform the high-dimensional tensor tensor-based knowledge in middle layers to one-dimensional knowledge based on singular value decomposition. Thus, the knowledge at the middle layers of teacher models can be effectively extracted and simplified, and extremely facilitate the learning of student models, even though the structure and parameter capacity between the teacher network and the student network are very different. To help the students learn the knowledge from the middle layers of teachers as accurately as possible, we also propose a novel loss function that can constrain values of the one-dimension knowledge learned by the student model to be as close as possible to that extracted from the teacher model, thus improving the learning efficiency of the student model. We have conducted extensive experiments on three major datasets (CIFAR-10, CIFAR-100, and ImageNet 1K), and the results of our method demonstrate superior performance by comparing with the state-of-the-art methods.
KW - Computer vision
KW - Deep learning
KW - Knowledge distillation
KW - Model compression
KW - Singular value decomposition
UR - http://www.scopus.com/inward/record.url?scp=105004735793&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2025.111721
DO - 10.1016/j.patcog.2025.111721
M3 - Article
AN - SCOPUS:105004735793
SN - 0031-3203
VL - 167
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 111721
ER -