TY - GEN
T1 - Privacy-Preserving Student Learning with Differentially Private Data-Free Distillation
AU - Liu, Bochao
AU - Lu, Jianghu
AU - Wang, Pengju
AU - Zhang, Junjie
AU - Zeng, Dan
AU - Qian, Zhenxing
AU - Ge, Shiming
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Deep learning models can achieve high inference accuracy by extracting rich knowledge from massive well-annotated data, but may pose the risk of data privacy leakage in practical deployment. In this paper, we present an effective teacher-student learning approach to train privacy-preserving deep learning models via differentially private data-free distillation. The main idea is generating synthetic data to learn a student that can mimic the ability of a teacher well-trained on private data. In the approach, a generator is first pretrained in a data-free manner by incorporating the teacher as a fixed discriminator. With the generator, massive synthetic data can be generated for model training without exposing data privacy. Then, the synthetic data is fed into the teacher to generate private labels. Towards this end, we propose a label differential privacy algorithm termed selective randomized response to protect the label information. Finally, a student is trained on the synthetic data with the supervision of private labels. In this way, both data privacy and label privacy are well protected in a unified framework, leading to privacy-preserving models. Extensive experiments and analysis clearly demonstrate the effectiveness of our approach.
AB - Deep learning models can achieve high inference accuracy by extracting rich knowledge from massive well-annotated data, but may pose the risk of data privacy leakage in practical deployment. In this paper, we present an effective teacher-student learning approach to train privacy-preserving deep learning models via differentially private data-free distillation. The main idea is generating synthetic data to learn a student that can mimic the ability of a teacher well-trained on private data. In the approach, a generator is first pretrained in a data-free manner by incorporating the teacher as a fixed discriminator. With the generator, massive synthetic data can be generated for model training without exposing data privacy. Then, the synthetic data is fed into the teacher to generate private labels. Towards this end, we propose a label differential privacy algorithm termed selective randomized response to protect the label information. Finally, a student is trained on the synthetic data with the supervision of private labels. In this way, both data privacy and label privacy are well protected in a unified framework, leading to privacy-preserving models. Extensive experiments and analysis clearly demonstrate the effectiveness of our approach.
KW - differential privacy
KW - knowledge distillation
KW - teacher-student learning
UR - http://www.scopus.com/inward/record.url?scp=85143585809&partnerID=8YFLogxK
U2 - 10.1109/MMSP55362.2022.9950001
DO - 10.1109/MMSP55362.2022.9950001
M3 - Conference Proceeding
AN - SCOPUS:85143585809
T3 - 2022 IEEE 24th International Workshop on Multimedia Signal Processing, MMSP 2022
BT - 2022 IEEE 24th International Workshop on Multimedia Signal Processing, MMSP 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th IEEE International Workshop on Multimedia Signal Processing, MMSP 2022
Y2 - 26 September 2022 through 28 September 2022
ER -