TY - JOUR
T1 - ISGm1A
T2 - Integration of Sequence Features and Genomic Features to Improve the Prediction of Human m1A RNA Methylation Sites
AU - Liu, Lian
AU - Lei, Xiujuan
AU - Meng, Jia
AU - Wei, Zhen
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - As a new epitranscriptomic modification, N1-methyladenosine (m1A) plays an important role in the gene expression regulation. Although some computational methods were proposed to predict m1A modification sites, all of these methods apply machine learning predictions based on the nucleotide sequence features, and they missed the layer of information in transcript topology and RNA secondary structures. To enhance the prediction model of m1A RNA methylation, we proposed a computational framework, ISGm1A, which stands for integration sequence features and genomic features to improve the prediction of human m1A RNA methylation sites. Based on the random forest algorithm, ISGm1A takes advantage of both conventional sequence features and 75 genomic characteristics to improve the prediction performance of m1A sites in human. The results of five-fold cross validation and independent test show that ISGm1A outperforms other prediction algorithms (AUC = 0.903 and 0.909). In addition, through analyzing the importance of features, we found that the genomic features play a more important role in site prediction than the sequence features. Furthermore, with ISGm1A, we generated a high accuracy map of m1A by predicting all adenines sites in the transcriptome. The data and the results of the study are freely accessible at: https://github.com/lianliu09/m1a_prediction.git.
AB - As a new epitranscriptomic modification, N1-methyladenosine (m1A) plays an important role in the gene expression regulation. Although some computational methods were proposed to predict m1A modification sites, all of these methods apply machine learning predictions based on the nucleotide sequence features, and they missed the layer of information in transcript topology and RNA secondary structures. To enhance the prediction model of m1A RNA methylation, we proposed a computational framework, ISGm1A, which stands for integration sequence features and genomic features to improve the prediction of human m1A RNA methylation sites. Based on the random forest algorithm, ISGm1A takes advantage of both conventional sequence features and 75 genomic characteristics to improve the prediction performance of m1A sites in human. The results of five-fold cross validation and independent test show that ISGm1A outperforms other prediction algorithms (AUC = 0.903 and 0.909). In addition, through analyzing the importance of features, we found that the genomic features play a more important role in site prediction than the sequence features. Furthermore, with ISGm1A, we generated a high accuracy map of m1A by predicting all adenines sites in the transcriptome. The data and the results of the study are freely accessible at: https://github.com/lianliu09/m1a_prediction.git.
KW - Epitranscriptome
KW - genomic features
KW - m¹A
KW - sequence features
KW - site prediction
UR - http://www.scopus.com/inward/record.url?scp=85084929332&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.2991070
DO - 10.1109/ACCESS.2020.2991070
M3 - Article
AN - SCOPUS:85084929332
SN - 2169-3536
VL - 8
SP - 81971
EP - 81977
JO - IEEE Access
JF - IEEE Access
M1 - 9079809
ER -