Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery

Yang Hao; Bo Li; Daiyun Huang; Sijin Wu; Tianjun Wang; Lei Fu; Xin Liu

doi:10.3390/ijms25158239

Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery

Yang Hao, Bo Li, Daiyun Huang^*, Sijin Wu, Tianjun Wang, Lei Fu, Xin Liu^*

^*Corresponding author for this work

AoPHA Faculty

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.

Original language	English
Article number	8239
Journal	International Journal of Molecular Sciences
Volume	25
Issue number	15
DOIs	https://doi.org/10.3390/ijms25158239
Publication status	Published - Aug 2024

Keywords

multitarget drug
PU-learning
Support Vector Machine (SVM)
virtual screening

Access to Document

10.3390/ijms25158239

Cite this

@article{9c9138df46094e3aacb1a12c0e135769,

title = "Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery",

abstract = "Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.",

keywords = "multitarget drug, PU-learning, Support Vector Machine (SVM), virtual screening",

author = "Yang Hao and Bo Li and Daiyun Huang and Sijin Wu and Tianjun Wang and Lei Fu and Xin Liu",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = aug,

doi = "10.3390/ijms25158239",

language = "English",

volume = "25",

journal = "International Journal of Molecular Sciences",

issn = "1661-6596",

number = "15",

}

TY - JOUR

T1 - Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery

AU - Hao, Yang

AU - Li, Bo

AU - Huang, Daiyun

AU - Wu, Sijin

AU - Wang, Tianjun

AU - Fu, Lei

AU - Liu, Xin

PY - 2024/8

Y1 - 2024/8

N2 - Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.

AB - Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.

KW - multitarget drug

KW - PU-learning

KW - Support Vector Machine (SVM)

KW - virtual screening

UR - http://www.scopus.com/inward/record.url?scp=85200970288&partnerID=8YFLogxK

U2 - 10.3390/ijms25158239

DO - 10.3390/ijms25158239

M3 - Article

C2 - 39125808

AN - SCOPUS:85200970288

SN - 1661-6596

VL - 25

JO - International Journal of Molecular Sciences

JF - International Journal of Molecular Sciences

IS - 15

M1 - 8239

ER -

Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this