TY - JOUR
T1 - Evaluating the Reliability of Machine Learning Predictors in m6A-SNP Association Analysis
T2 - A Comparative Study Using m6A-QTL Data
AU - Mao, Zhongzheng
AU - Wei, Zhen
N1 - Publisher Copyright:
© 2024 Bentham Science Publishers.
PY - 2024
Y1 - 2024
N2 - Introduction: N6-Methyladenosine (m6A) plays a crucial role in determining the fate of RNA after transcription. Understanding the downstream functions of individual m6A sites is of critical interest in epitranscriptomics. In published studies, two main approaches have been used to decipher the specific impact of m6A sites on gene expression and disease/traits: the m6A quantitative trait loci (m6A-QTL) and in-silico mutation prediction by Machine Learning (ML) models. However, earlier works still lack independent validation for the performance of ML-based methods. Methods: In this study, we use m6A-QTL as ground truth to evaluate the outcomes of in-silico mutation models. We benchmark both the newly trained machine learning models using genomic or sequence features and the existing model inference results published in in-silico mutation-dependent databases against m6A-QTL. Results: We found that the consistency between in-silico mutation and m6A-QTL is weak, regardless of the ML algorithms and predictive features used. This trend was also consistent across multiple published databases based on in-silico mutation, including RMDisease2, m6AVar, and RMVar. Conclusion: These results highlight the importance of critical empirical evaluations for ML models in future SNP-m6A association studies and suggest the need for more high-quality m6A-QTL experiments to guide model development.
AB - Introduction: N6-Methyladenosine (m6A) plays a crucial role in determining the fate of RNA after transcription. Understanding the downstream functions of individual m6A sites is of critical interest in epitranscriptomics. In published studies, two main approaches have been used to decipher the specific impact of m6A sites on gene expression and disease/traits: the m6A quantitative trait loci (m6A-QTL) and in-silico mutation prediction by Machine Learning (ML) models. However, earlier works still lack independent validation for the performance of ML-based methods. Methods: In this study, we use m6A-QTL as ground truth to evaluate the outcomes of in-silico mutation models. We benchmark both the newly trained machine learning models using genomic or sequence features and the existing model inference results published in in-silico mutation-dependent databases against m6A-QTL. Results: We found that the consistency between in-silico mutation and m6A-QTL is weak, regardless of the ML algorithms and predictive features used. This trend was also consistent across multiple published databases based on in-silico mutation, including RMDisease2, m6AVar, and RMVar. Conclusion: These results highlight the importance of critical empirical evaluations for ML models in future SNP-m6A association studies and suggest the need for more high-quality m6A-QTL experiments to guide model development.
KW - Epitranscriptomics
KW - Functional annotation
KW - in-silico mutation
KW - mA
KW - mA-QTL
KW - Machine Learning
KW - N6-methyladenosine
UR - http://www.scopus.com/inward/record.url?scp=85215071104&partnerID=8YFLogxK
U2 - 10.2174/0115748936332078240826105023
DO - 10.2174/0115748936332078240826105023
M3 - Article
AN - SCOPUS:85215071104
SN - 1574-8936
JO - Current Bioinformatics
JF - Current Bioinformatics
ER -