TY - JOUR
T1 - Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery
AU - Hao, Yang
AU - Li, Bo
AU - Huang, Daiyun
AU - Wu, Sijin
AU - Wang, Tianjun
AU - Fu, Lei
AU - Liu, Xin
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/8
Y1 - 2024/8
N2 - Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
AB - Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
KW - multitarget drug
KW - PU-learning
KW - Support Vector Machine (SVM)
KW - virtual screening
UR - http://www.scopus.com/inward/record.url?scp=85200970288&partnerID=8YFLogxK
U2 - 10.3390/ijms25158239
DO - 10.3390/ijms25158239
M3 - Article
C2 - 39125808
AN - SCOPUS:85200970288
SN - 1661-6596
VL - 25
JO - International Journal of Molecular Sciences
JF - International Journal of Molecular Sciences
IS - 15
M1 - 8239
ER -