TY - GEN
T1 - Decoding the stochastic profile of m6A over the entire transcriptome
AU - Wang, Jiaying
AU - Wei, Zhen
AU - Zhang, Yuxin
N1 - Publisher Copyright:
© VDE VERLAG GMBH.
PY - 2022
Y1 - 2022
N2 - N6-methladenosine (m6A), an abundant eukaryotic mRNA modification, is a crucial epigenetic marker dynamically regulated by demethylase (Erasers), methyltransferase (Writers), and binding proteins (Readers). Hence, decoding the stochastic profile of m6A over transcriptome is invaluable to our understanding of its biological functions. The m6A site over 1624625 DRACH motifs on human exons were summarized from 40 experiments. Four machine learning algorithms, generalized linear model (GLM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and random forest (RF), were implemented to build Poisson regression models. Compared with classification models used in previous studies, our model provides a new framework to integrate multiple single-base RNA modification datasets. We demonstrated that the Poisson regressors can better predict the biological and technical variation between experiments than classifiers trained with same features. In addition, we for the first time utilized the protein binding information for prediction and achieved significantly better performance than models based on only sequence-derived and genome-derived features.
AB - N6-methladenosine (m6A), an abundant eukaryotic mRNA modification, is a crucial epigenetic marker dynamically regulated by demethylase (Erasers), methyltransferase (Writers), and binding proteins (Readers). Hence, decoding the stochastic profile of m6A over transcriptome is invaluable to our understanding of its biological functions. The m6A site over 1624625 DRACH motifs on human exons were summarized from 40 experiments. Four machine learning algorithms, generalized linear model (GLM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and random forest (RF), were implemented to build Poisson regression models. Compared with classification models used in previous studies, our model provides a new framework to integrate multiple single-base RNA modification datasets. We demonstrated that the Poisson regressors can better predict the biological and technical variation between experiments than classifiers trained with same features. In addition, we for the first time utilized the protein binding information for prediction and achieved significantly better performance than models based on only sequence-derived and genome-derived features.
UR - http://www.scopus.com/inward/record.url?scp=85145650287&partnerID=8YFLogxK
M3 - Conference Proceeding
AN - SCOPUS:85145650287
T3 - BIBE 2022 - 6th International Conference on Biological Information and Biomedical Engineering
SP - 79
EP - 82
BT - BIBE 2022 - 6th International Conference on Biological Information and Biomedical Engineering
A2 - Chen, Bin
PB - VDE Verlag GmbH
T2 - 6th International Conference on Biological Information and Biomedical Engineering, BIBE 2022
Y2 - 19 July 2022 through 20 July 2022
ER -