TY - JOUR
T1 - m7GDisAI
T2 - N7-methylguanosine (m7G) sites and diseases associations inference based on heterogeneous network
AU - Ma, Jiani
AU - Zhang, Lin
AU - Chen, Jin
AU - Song, Bowen
AU - Zang, Chenxuan
AU - Liu, Hui
N1 - Funding Information:
This work has been supported by Postgraduate Student Education Reform Research and Practice Funds (Research Projects No. 2019YJSJG045 to LZ), the National Natural Science Foundation of China (Research Projects Nos. 61971422 to LZ, 31871337 to HL). The funding body did not play any roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Background: Recent studies have confirmed that N7-methylguanosine (m7G) modification plays an important role in regulating various biological processes and has associations with multiple diseases. Wet-lab experiments are cost and time ineffective for the identification of disease-associated m7G sites. To date, tens of thousands of m7G sites have been identified by high-throughput sequencing approaches and the information is publicly available in bioinformatics databases, which can be leveraged to predict potential disease-associated m7G sites using a computational perspective. Thus, computational methods for m7G-disease association prediction are urgently needed, but none are currently available at present. Results: To fill this gap, we collected association information between m7G sites and diseases, genomic information of m7G sites, and phenotypic information of diseases from different databases to build an m7G-disease association dataset. To infer potential disease-associated m7G sites, we then proposed a heterogeneous network-based model, m7G Sites and Diseases Associations Inference (m7GDisAI) model. m7GDisAI predicts the potential disease-associated m7G sites by applying a matrix decomposition method on heterogeneous networks which integrate comprehensive similarity information of m7G sites and diseases. To evaluate the prediction performance, 10 runs of tenfold cross validation were first conducted, and m7GDisAI got the highest AUC of 0.740(± 0.0024). Then global and local leave-one-out cross validation (LOOCV) experiments were implemented to evaluate the model’s accuracy in global and local situations respectively. AUC of 0.769 was achieved in global LOOCV, while 0.635 in local LOOCV. A case study was finally conducted to identify the most promising ovarian cancer-related m7G sites for further functional analysis. Gene Ontology (GO) enrichment analysis was performed to explore the complex associations between host gene of m7G sites and GO terms. The results showed that m7GDisAI identified disease-associated m7G sites and their host genes are consistently related to the pathogenesis of ovarian cancer, which may provide some clues for pathogenesis of diseases. Conclusion: The m7GDisAI web server can be accessed at http://180.208.58.66/m7GDisAI/, which provides a user-friendly interface to query disease associated m7G. The list of top 20 m7G sites predicted to be associted with 177 diseases can be achieved. Furthermore, detailed information about specific m7G sites and diseases are also shown.
AB - Background: Recent studies have confirmed that N7-methylguanosine (m7G) modification plays an important role in regulating various biological processes and has associations with multiple diseases. Wet-lab experiments are cost and time ineffective for the identification of disease-associated m7G sites. To date, tens of thousands of m7G sites have been identified by high-throughput sequencing approaches and the information is publicly available in bioinformatics databases, which can be leveraged to predict potential disease-associated m7G sites using a computational perspective. Thus, computational methods for m7G-disease association prediction are urgently needed, but none are currently available at present. Results: To fill this gap, we collected association information between m7G sites and diseases, genomic information of m7G sites, and phenotypic information of diseases from different databases to build an m7G-disease association dataset. To infer potential disease-associated m7G sites, we then proposed a heterogeneous network-based model, m7G Sites and Diseases Associations Inference (m7GDisAI) model. m7GDisAI predicts the potential disease-associated m7G sites by applying a matrix decomposition method on heterogeneous networks which integrate comprehensive similarity information of m7G sites and diseases. To evaluate the prediction performance, 10 runs of tenfold cross validation were first conducted, and m7GDisAI got the highest AUC of 0.740(± 0.0024). Then global and local leave-one-out cross validation (LOOCV) experiments were implemented to evaluate the model’s accuracy in global and local situations respectively. AUC of 0.769 was achieved in global LOOCV, while 0.635 in local LOOCV. A case study was finally conducted to identify the most promising ovarian cancer-related m7G sites for further functional analysis. Gene Ontology (GO) enrichment analysis was performed to explore the complex associations between host gene of m7G sites and GO terms. The results showed that m7GDisAI identified disease-associated m7G sites and their host genes are consistently related to the pathogenesis of ovarian cancer, which may provide some clues for pathogenesis of diseases. Conclusion: The m7GDisAI web server can be accessed at http://180.208.58.66/m7GDisAI/, which provides a user-friendly interface to query disease associated m7G. The list of top 20 m7G sites predicted to be associted with 177 diseases can be achieved. Furthermore, detailed information about specific m7G sites and diseases are also shown.
KW - Heterogeneous network
KW - mG site
KW - Matrix decomposition
UR - http://www.scopus.com/inward/record.url?scp=85103347917&partnerID=8YFLogxK
U2 - 10.1186/s12859-021-04007-9
DO - 10.1186/s12859-021-04007-9
M3 - Article
C2 - 33761868
AN - SCOPUS:85103347917
SN - 1471-2105
VL - 22
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 152
ER -