Efficient prediction of peptide self-assembly through sequential and graphical encoding

Zihan Liu; Jiaqi Wang; Yun Luo; Shuang Zhao; Wenbin Li; Stan Z Li

doi:10.1093/bib/bbad409

Efficient prediction of peptide self-assembly through sequential and graphical encoding

Zihan Liu, Jiaqi Wang, Yun Luo, Shuang Zhao, Wenbin Li, Stan Z Li

AoPHA Faculty

Research output: Contribution to journal › Article › peer-review

11 Citations (Scopus)

Abstract

In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for artificial intelligence-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62 000 samples generated by coarse-grained molecular dynamics. Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e. recurrent neural network, long short-term memory and Transformer) and structural deep learning models (i.e. graph convolutional network, graph attention network and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.

Original language	English
Pages (from-to)	bbad409
Journal	Briefings in Bioinformatics
Volume	24
Issue number	6
DOIs	https://doi.org/10.1093/bib/bbad409
Publication status	Published - 2023

Access to Document

10.1093/bib/bbad409

Cite this

@article{1d2bf60db7ba4ae5bb25d30be866a49e,

title = "Efficient prediction of peptide self-assembly through sequential and graphical encoding",

abstract = "In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for artificial intelligence-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62 000 samples generated by coarse-grained molecular dynamics. Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e. recurrent neural network, long short-term memory and Transformer) and structural deep learning models (i.e. graph convolutional network, graph attention network and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.",

author = "Zihan Liu and Jiaqi Wang and Yun Luo and Shuang Zhao and Wenbin Li and Li, {Stan Z}",

year = "2023",

doi = "10.1093/bib/bbad409",

language = "English",

volume = "24",

pages = "bbad409",

journal = "Briefings in Bioinformatics",

issn = "1467-5463",

publisher = "Oxford University Press",

number = "6",

}

TY - JOUR

T1 - Efficient prediction of peptide self-assembly through sequential and graphical encoding

AU - Liu, Zihan

AU - Wang, Jiaqi

AU - Luo, Yun

AU - Zhao, Shuang

AU - Li, Wenbin

AU - Li, Stan Z

PY - 2023

Y1 - 2023

N2 - In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for artificial intelligence-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62 000 samples generated by coarse-grained molecular dynamics. Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e. recurrent neural network, long short-term memory and Transformer) and structural deep learning models (i.e. graph convolutional network, graph attention network and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.

AB - In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for artificial intelligence-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62 000 samples generated by coarse-grained molecular dynamics. Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e. recurrent neural network, long short-term memory and Transformer) and structural deep learning models (i.e. graph convolutional network, graph attention network and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.

U2 - 10.1093/bib/bbad409

DO - 10.1093/bib/bbad409

M3 - Article

C2 - 37974507

SN - 1467-5463

VL - 24

SP - bbad409

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

IS - 6

ER -

Efficient prediction of peptide self-assembly through sequential and graphical encoding

Abstract

Access to Document

Fingerprint

Cite this