TY - GEN
T1 - Prediction of m6A Reader Substrate Sites Using Deep Convolutional and Recurrent Neural Network
AU - Wu, Yuxuan
AU - Zhang, Yuxin
AU - Wang, Ruoqi
AU - Meng, Jia
AU - Chen, Kunqi
AU - Song, Yiyou
AU - Huang, Daiyun
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/20
Y1 - 2021/7/20
N2 - N6-methyladenosine (m6A), one of the most common post-transcriptional mRNA modifications, has been proved to correlate with multiple biological functions through the process of binding to specific m6A reader proteins. Various m6A readers exist among the genome of human beings, however, owing to the scarce wet experiments related to this topic, the binding specificity of proteins was not elucidated. Therefore, a deep learning approach combined with CNN and RNN frameworks was generated to predict the epitranscriptome-wide targets of six m6A reader proteins (YTHDF1-3, YTHDC1-2, EIF3A). Additionally, layer-wise relevance calculation was conducted to obtain each input feature contribution and tried to explain the model training process. Finally, we achieved superior performance in the classification, with an average AUROC of 0.942 in EIF3A full transcript, higher than the typical conventional machine learning algorithms (SVM) under the same condition. Moreover, we quantified the most optimal sequence length (1001bp) during the m6A reader substrate prediction. This research paves the way for further RNA methylation target prediction and functional characterization of m6A readers.
AB - N6-methyladenosine (m6A), one of the most common post-transcriptional mRNA modifications, has been proved to correlate with multiple biological functions through the process of binding to specific m6A reader proteins. Various m6A readers exist among the genome of human beings, however, owing to the scarce wet experiments related to this topic, the binding specificity of proteins was not elucidated. Therefore, a deep learning approach combined with CNN and RNN frameworks was generated to predict the epitranscriptome-wide targets of six m6A reader proteins (YTHDF1-3, YTHDC1-2, EIF3A). Additionally, layer-wise relevance calculation was conducted to obtain each input feature contribution and tried to explain the model training process. Finally, we achieved superior performance in the classification, with an average AUROC of 0.942 in EIF3A full transcript, higher than the typical conventional machine learning algorithms (SVM) under the same condition. Moreover, we quantified the most optimal sequence length (1001bp) during the m6A reader substrate prediction. This research paves the way for further RNA methylation target prediction and functional characterization of m6A readers.
KW - Convolutional neural network
KW - Deep learning
KW - Readers
KW - Recurrent neural network
KW - m6A
UR - http://www.scopus.com/inward/record.url?scp=85120522236&partnerID=8YFLogxK
U2 - 10.1145/3469678.3469706
DO - 10.1145/3469678.3469706
M3 - Conference Proceeding
AN - SCOPUS:85120522236
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 5th International Conference on Biological Information and Biomedical Engineering, BIBE 2021
A2 - Chen, Bin
PB - Association for Computing Machinery
T2 - 5th International Conference on Biological Information and Biomedical Engineering, BIBE 2021
Y2 - 20 July 2021 through 21 July 2021
ER -