Clustering count-based RNA methylation data using a nonparametric generative model

Lin Zhang, Yanling He, Huaizhi Wang, Hui Liu*, Yufei Huang, Xuesong Wang, Jia Meng

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)

Abstract

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Original languageEnglish
Pages (from-to)11-23
Number of pages13
JournalCurrent Bioinformatics
Volume14
Issue number1
DOIs
Publication statusPublished - 2019

Keywords

  • Beta-binomial mixture
  • Clustering
  • Dirichlet process
  • Epitranscriptome
  • MA-seq
  • RNA methylation

Fingerprint

Dive into the research topics of 'Clustering count-based RNA methylation data using a nonparametric generative model'. Together they form a unique fingerprint.

Cite this