Weakly supervised learning of RNA modifications from low-resolution epitranscriptome data

Daiyun Huang; Bowen Song; Jingjue Wei; Jionglong Su; Frans Coenen; Jia Meng

doi:10.1093/bioinformatics/btab278

Weakly supervised learning of RNA modifications from low-resolution epitranscriptome data

Daiyun Huang, Bowen Song, Jingjue Wei, Jionglong Su, Frans Coenen, Jia Meng^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

25 Citations (Scopus)

Abstract

Motivation: Increasing evidence suggests that post-transcriptional ribonucleic acid (RNA) modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches for predicting RNA modifications have been developed, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available. Results: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution. Availability implementation: The source code for the WeakRM algorithm, along with the datasets used, are freely accessible at: https://github.com/daiyun02211/WeakRM.

Original language	English
Pages (from-to)	I222-I230
Journal	Bioinformatics
Volume	37
Issue number	Supplement_1
DOIs	https://doi.org/10.1093/bioinformatics/btab278 https://doi.org/10.1093/bioinformatics/btab278
Publication status	Published - 1 Jul 2021

Access to Document

Cite this

@article{a9c675e3cfa741cd866380a7ae1ca940,

title = "Weakly supervised learning of RNA modifications from low-resolution epitranscriptome data",

abstract = "Motivation: Increasing evidence suggests that post-transcriptional ribonucleic acid (RNA) modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches for predicting RNA modifications have been developed, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available. Results: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution. Availability implementation: The source code for the WeakRM algorithm, along with the datasets used, are freely accessible at: https://github.com/daiyun02211/WeakRM.",

author = "Daiyun Huang and Bowen Song and Jingjue Wei and Jionglong Su and Frans Coenen and Jia Meng",

year = "2021",

month = jul,

day = "1",

doi = "10.1093/bioinformatics/btab278",

language = "English",

volume = "37",

pages = "I222--I230",

journal = "Bioinformatics",

issn = "1367-4803",

number = "Supplement_1",

}

TY - JOUR

T1 - Weakly supervised learning of RNA modifications from low-resolution epitranscriptome data

AU - Huang, Daiyun

AU - Song, Bowen

AU - Wei, Jingjue

AU - Su, Jionglong

AU - Coenen, Frans

AU - Meng, Jia

PY - 2021/7/1

Y1 - 2021/7/1

N2 - Motivation: Increasing evidence suggests that post-transcriptional ribonucleic acid (RNA) modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches for predicting RNA modifications have been developed, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available. Results: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution. Availability implementation: The source code for the WeakRM algorithm, along with the datasets used, are freely accessible at: https://github.com/daiyun02211/WeakRM.

AB - Motivation: Increasing evidence suggests that post-transcriptional ribonucleic acid (RNA) modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches for predicting RNA modifications have been developed, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available. Results: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution. Availability implementation: The source code for the WeakRM algorithm, along with the datasets used, are freely accessible at: https://github.com/daiyun02211/WeakRM.

UR - http://www.scopus.com/inward/record.url?scp=85111980240&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btab278

DO - 10.1093/bioinformatics/btab278

M3 - Article

C2 - 34252943

AN - SCOPUS:85111980240

SN - 1367-4803

VL - 37

SP - I222-I230

JO - Bioinformatics

JF - Bioinformatics

IS - Supplement_1

ER -

Weakly supervised learning of RNA modifications from low-resolution epitranscriptome data

Abstract

Access to Document

Other files and links

Cite this