TY - JOUR
T1 - WHISTLE server
T2 - A high-accuracy genomic coordinate-based machine learning platform for RNA modification prediction
AU - Liu, Lian
AU - Song, Bowen
AU - Chen, Kunqi
AU - Zhang, Yuxin
AU - de Magalhães, João Pedro
AU - Rigden, Daniel J.
AU - Lei, Xiujuan
AU - Wei, Zhen
N1 - Funding Information:
This work has been supported by National Natural Science Foundation of China [ 61902230 to L.L., 61972451 to X.L.]; China Postdoctoral Science Foundation [ 2018M640949 to L.L.]; Fundamental Research Funds for the Central Universities [ GK202103091 to L.L., GK201901010 to X.L.].
Publisher Copyright:
© 2021
PY - 2022/7
Y1 - 2022/7
N2 - The primary sequences of DNA, RNA and protein have been used as the dominant information source of existing machine learning tools, especially for contexts not fully explored by wet-experimental approaches. Since molecular markers are profoundly orchestrated in the living organisms, those markers that cannot be unambiguously recovered from the primary sequence often help to predict other biological events. To the best of our knowledge, there is no current tool to build and deploy machine learning models that consider genomic evidence. We therefore developed the WHISTLE server, the first machine learning platform based on genomic coordinates. It features convenient covariate extraction and model web deployment with 46 distinct genomic features integrated along with the conventional sequence features. We showed that, when predicting m6A sites from SRAMP project, the model integrating genomic features substantially outperformed those based on only sequence features. The WHISTLE server should be a useful tool for studying biological attributes specifically associated with genomic coordinates, and is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/whi2.
AB - The primary sequences of DNA, RNA and protein have been used as the dominant information source of existing machine learning tools, especially for contexts not fully explored by wet-experimental approaches. Since molecular markers are profoundly orchestrated in the living organisms, those markers that cannot be unambiguously recovered from the primary sequence often help to predict other biological events. To the best of our knowledge, there is no current tool to build and deploy machine learning models that consider genomic evidence. We therefore developed the WHISTLE server, the first machine learning platform based on genomic coordinates. It features convenient covariate extraction and model web deployment with 46 distinct genomic features integrated along with the conventional sequence features. We showed that, when predicting m6A sites from SRAMP project, the model integrating genomic features substantially outperformed those based on only sequence features. The WHISTLE server should be a useful tool for studying biological attributes specifically associated with genomic coordinates, and is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/whi2.
KW - Epitranscriptome
KW - Genomic coordinate
KW - Web server
UR - http://www.scopus.com/inward/record.url?scp=85111058519&partnerID=8YFLogxK
U2 - 10.1016/j.ymeth.2021.07.003
DO - 10.1016/j.ymeth.2021.07.003
M3 - Article
C2 - 34245870
AN - SCOPUS:85111058519
SN - 1046-2023
VL - 203
SP - 378
EP - 382
JO - Methods
JF - Methods
ER -