SVD-KD: SVD-based hidden layer feature extraction for Knowledge distillation

Jianhua Zhang, Yi Gao, Mian Zhou, Ruyu Liu*, Xu Cheng, Saša V. Nikolić, Shengyong Chen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advancement of Knowledge distillation (KD) is to extract and transfer middle-layer knowledge of teacher models to student models, which is better than original KDs which only transfer the last layer of knowledge. However, the middle-layer knowledge commonly appears as a high-dimensional tensor, which is more difficult to transfer than the one-dimensional knowledge in the last layer. Moreover, when there are significant differences between the teachers and students in terms of model parameter capabilities and model structures, the middle layers of teacher and student models differ in dimensionality and structure, which further increase the learning difficulty for student models. To solve these problems, we propose a novel knowledge extraction module to transform the high-dimensional tensor tensor-based knowledge in middle layers to one-dimensional knowledge based on singular value decomposition. Thus, the knowledge at the middle layers of teacher models can be effectively extracted and simplified, and extremely facilitate the learning of student models, even though the structure and parameter capacity between the teacher network and the student network are very different. To help the students learn the knowledge from the middle layers of teachers as accurately as possible, we also propose a novel loss function that can constrain values of the one-dimension knowledge learned by the student model to be as close as possible to that extracted from the teacher model, thus improving the learning efficiency of the student model. We have conducted extensive experiments on three major datasets (CIFAR-10, CIFAR-100, and ImageNet 1K), and the results of our method demonstrate superior performance by comparing with the state-of-the-art methods.

Original languageEnglish
Article number111721
JournalPattern Recognition
Volume167
DOIs
Publication statusPublished - Nov 2025

Keywords

  • Computer vision
  • Deep learning
  • Knowledge distillation
  • Model compression
  • Singular value decomposition

Fingerprint

Dive into the research topics of 'SVD-KD: SVD-based hidden layer feature extraction for Knowledge distillation'. Together they form a unique fingerprint.

Cite this