Evaluating the Reliability of Machine Learning Predictors in m6A-SNP Association Analysis: A Comparative Study Using m6A-QTL Data

Zhongzheng Mao; Zhen Wei

doi:10.2174/0115748936332078240826105023

Evaluating the Reliability of Machine Learning Predictors in m⁶A-SNP Association Analysis: A Comparative Study Using m⁶A-QTL Data

Zhongzheng Mao, Zhen Wei^*

^*Corresponding author for this work

Department of Biosciences and Bioinformatics

Yale University

Research output: Contribution to journal › Article › peer-review

Abstract

Introduction: N6-Methyladenosine (m⁶A) plays a crucial role in determining the fate of RNA after transcription. Understanding the downstream functions of individual m⁶A sites is of critical interest in epitranscriptomics. In published studies, two main approaches have been used to decipher the specific impact of m⁶A sites on gene expression and disease/traits: the m⁶A quantitative trait loci (m⁶A-QTL) and in-silico mutation prediction by Machine Learning (ML) models. However, earlier works still lack independent validation for the performance of ML-based methods. Methods: In this study, we use m⁶A-QTL as ground truth to evaluate the outcomes of in-silico mutation models. We benchmark both the newly trained machine learning models using genomic or sequence features and the existing model inference results published in in-silico mutation-dependent databases against m⁶A-QTL. Results: We found that the consistency between in-silico mutation and m⁶A-QTL is weak, regardless of the ML algorithms and predictive features used. This trend was also similar across multiple published databases based on in-silico mutation, including RMDisease2, m⁶AVar, and RMVar. Conclusion: These results emphasize the importance of critical empirical evaluations for ML models in future SNP-m⁶A association studies and suggest the need for more high-quality m⁶A-QTL experiments to guide model development.

Original language	English
Pages (from-to)	631-640
Number of pages	10
Journal	Current Bioinformatics
Volume	20
Issue number	7
DOIs	https://doi.org/10.2174/0115748936332078240826105023
Publication status	Published - 2025

Keywords

N6-methyladenosine
epitranscriptomics
functional annotation
in-silico mutation
mA
mA-QTL
machine learning

Access to Document

10.2174/0115748936332078240826105023

Cite this

@article{eeaa0fe497c344c5aa2723bf63423055,

title = "Evaluating the Reliability of Machine Learning Predictors in m6A-SNP Association Analysis: A Comparative Study Using m6A-QTL Data",

abstract = "Introduction: N6-Methyladenosine (m6A) plays a crucial role in determining the fate of RNA after transcription. Understanding the downstream functions of individual m6A sites is of critical interest in epitranscriptomics. In published studies, two main approaches have been used to decipher the specific impact of m6A sites on gene expression and disease/traits: the m6A quantitative trait loci (m6A-QTL) and in-silico mutation prediction by Machine Learning (ML) models. However, earlier works still lack independent validation for the performance of ML-based methods. Methods: In this study, we use m6A-QTL as ground truth to evaluate the outcomes of in-silico mutation models. We benchmark both the newly trained machine learning models using genomic or sequence features and the existing model inference results published in in-silico mutation-dependent databases against m6A-QTL. Results: We found that the consistency between in-silico mutation and m6A-QTL is weak, regardless of the ML algorithms and predictive features used. This trend was also similar across multiple published databases based on in-silico mutation, including RMDisease2, m6AVar, and RMVar. Conclusion: These results emphasize the importance of critical empirical evaluations for ML models in future SNP-m6A association studies and suggest the need for more high-quality m6A-QTL experiments to guide model development.",

keywords = "N6-methyladenosine, epitranscriptomics, functional annotation, in-silico mutation, mA, mA-QTL, machine learning",

author = "Zhongzheng Mao and Zhen Wei",

note = "Publisher Copyright: {\textcopyright} 2025 Bentham Science Publishers.",

year = "2025",

doi = "10.2174/0115748936332078240826105023",

language = "English",

volume = "20",

pages = "631--640",

journal = "Current Bioinformatics",

issn = "1574-8936",

number = "7",

}

TY - JOUR

T1 - Evaluating the Reliability of Machine Learning Predictors in m6A-SNP Association Analysis

T2 - A Comparative Study Using m6A-QTL Data

AU - Mao, Zhongzheng

AU - Wei, Zhen

PY - 2025

Y1 - 2025

N2 - Introduction: N6-Methyladenosine (m6A) plays a crucial role in determining the fate of RNA after transcription. Understanding the downstream functions of individual m6A sites is of critical interest in epitranscriptomics. In published studies, two main approaches have been used to decipher the specific impact of m6A sites on gene expression and disease/traits: the m6A quantitative trait loci (m6A-QTL) and in-silico mutation prediction by Machine Learning (ML) models. However, earlier works still lack independent validation for the performance of ML-based methods. Methods: In this study, we use m6A-QTL as ground truth to evaluate the outcomes of in-silico mutation models. We benchmark both the newly trained machine learning models using genomic or sequence features and the existing model inference results published in in-silico mutation-dependent databases against m6A-QTL. Results: We found that the consistency between in-silico mutation and m6A-QTL is weak, regardless of the ML algorithms and predictive features used. This trend was also similar across multiple published databases based on in-silico mutation, including RMDisease2, m6AVar, and RMVar. Conclusion: These results emphasize the importance of critical empirical evaluations for ML models in future SNP-m6A association studies and suggest the need for more high-quality m6A-QTL experiments to guide model development.

AB - Introduction: N6-Methyladenosine (m6A) plays a crucial role in determining the fate of RNA after transcription. Understanding the downstream functions of individual m6A sites is of critical interest in epitranscriptomics. In published studies, two main approaches have been used to decipher the specific impact of m6A sites on gene expression and disease/traits: the m6A quantitative trait loci (m6A-QTL) and in-silico mutation prediction by Machine Learning (ML) models. However, earlier works still lack independent validation for the performance of ML-based methods. Methods: In this study, we use m6A-QTL as ground truth to evaluate the outcomes of in-silico mutation models. We benchmark both the newly trained machine learning models using genomic or sequence features and the existing model inference results published in in-silico mutation-dependent databases against m6A-QTL. Results: We found that the consistency between in-silico mutation and m6A-QTL is weak, regardless of the ML algorithms and predictive features used. This trend was also similar across multiple published databases based on in-silico mutation, including RMDisease2, m6AVar, and RMVar. Conclusion: These results emphasize the importance of critical empirical evaluations for ML models in future SNP-m6A association studies and suggest the need for more high-quality m6A-QTL experiments to guide model development.

KW - N6-methyladenosine

KW - epitranscriptomics

KW - functional annotation

KW - in-silico mutation

KW - mA

KW - mA-QTL

KW - machine learning

UR - http://www.scopus.com/inward/record.url?scp=85215071104&partnerID=8YFLogxK

U2 - 10.2174/0115748936332078240826105023

DO - 10.2174/0115748936332078240826105023

M3 - Article

AN - SCOPUS:85215071104

SN - 1574-8936

VL - 20

SP - 631

EP - 640

JO - Current Bioinformatics

JF - Current Bioinformatics

IS - 7

ER -

Evaluating the Reliability of Machine Learning Predictors in m⁶A-SNP Association Analysis: A Comparative Study Using m⁶A-QTL Data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this