TY - JOUR
T1 - m5U-GEPred
T2 - prediction of RNA 5-methyluridine sites based on sequence-derived and graph embedding features
AU - Xu, Zhongxing
AU - Wang, Xuan
AU - Meng, Jia
AU - Zhang, Lin
AU - Song, Bowen
N1 - Publisher Copyright:
Copyright © 2023 Xu, Wang, Meng, Zhang and Song.
PY - 2023
Y1 - 2023
N2 - 5-Methyluridine (m5U) is one of the most common post-transcriptional RNA modifications, which is involved in a variety of important biological processes and disease development. The precise identification of the m5U sites allows for a better understanding of the biological processes of RNA and contributes to the discovery of new RNA functional and therapeutic targets. Here, we present m5U-GEPred, a prediction framework, to combine sequence characteristics and graph embedding-based information for m5U identification. The graph embedding approach was introduced to extract the global information of training data that complemented the local information represented by conventional sequence features, thereby enhancing the prediction performance of m5U identification. m5U-GEPred outperformed the state-of-the-art m5U predictors built on two independent species, with an average AUROC of 0.984 and 0.985 tested on human and yeast transcriptomes, respectively. To further validate the performance of our newly proposed framework, the experimentally validated m5U sites identified from Oxford Nanopore Technology (ONT) were collected as independent testing data, and in this project, m5U-GEPred achieved reasonable prediction performance with ACC of 91.84%. We hope that m5U-GEPred should make a useful computational alternative for m5U identification.
AB - 5-Methyluridine (m5U) is one of the most common post-transcriptional RNA modifications, which is involved in a variety of important biological processes and disease development. The precise identification of the m5U sites allows for a better understanding of the biological processes of RNA and contributes to the discovery of new RNA functional and therapeutic targets. Here, we present m5U-GEPred, a prediction framework, to combine sequence characteristics and graph embedding-based information for m5U identification. The graph embedding approach was introduced to extract the global information of training data that complemented the local information represented by conventional sequence features, thereby enhancing the prediction performance of m5U identification. m5U-GEPred outperformed the state-of-the-art m5U predictors built on two independent species, with an average AUROC of 0.984 and 0.985 tested on human and yeast transcriptomes, respectively. To further validate the performance of our newly proposed framework, the experimentally validated m5U sites identified from Oxford Nanopore Technology (ONT) were collected as independent testing data, and in this project, m5U-GEPred achieved reasonable prediction performance with ACC of 91.84%. We hope that m5U-GEPred should make a useful computational alternative for m5U identification.
KW - 5-methyluridine
KW - RNA modification
KW - graph embedding
KW - multi-species
KW - sequence feature
UR - http://www.scopus.com/inward/record.url?scp=85175808468&partnerID=8YFLogxK
U2 - 10.3389/fmicb.2023.1277099
DO - 10.3389/fmicb.2023.1277099
M3 - Article
AN - SCOPUS:85175808468
SN - 1664-302X
VL - 14
JO - Frontiers in Microbiology
JF - Frontiers in Microbiology
M1 - 1277099
ER -