TY - JOUR
T1 - Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation
AU - Huang, Daiyun
AU - Chen, Kunqi
AU - Song, Bowen
AU - Wei, Zhen
AU - Su, Jionglong
AU - Coenen, Frans
AU - De Magalhães, João Pedro
AU - Rigden, Daniel J.
AU - Meng, Jia
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2022/10/14
Y1 - 2022/10/14
N2 - As the most pervasive epigenetic mark present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3′UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m6A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m6A sites, and improves m6A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m6A but also N1-methyladenosine (m1A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.
AB - As the most pervasive epigenetic mark present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3′UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m6A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m6A sites, and improves m6A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m6A but also N1-methyladenosine (m1A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.
UR - http://www.scopus.com/inward/record.url?scp=85139880537&partnerID=8YFLogxK
U2 - 10.1093/nar/gkac830
DO - 10.1093/nar/gkac830
M3 - Article
C2 - 36155798
AN - SCOPUS:85139880537
SN - 0305-1048
VL - 50
SP - 10290
EP - 10310
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 18
ER -