4acCPred: Weakly supervised prediction of N4-acetyldeoxycytosine DNA modification from sequences

Jingxian Zhou; Xuan Wang; Zhen Wei; Jia Meng; Daiyun Huang

doi:10.1016/j.omtn.2022.10.004

4acCPred: Weakly supervised prediction of N⁴-acetyldeoxycytosine DNA modification from sequences

Jingxian Zhou, Xuan Wang, Zhen Wei, Jia Meng, Daiyun Huang^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and highly associated with gene expression and actively transcribed genes. Precise identification of 4acC is essential for studying its biological function. We proposed the 4acCPred, the first computational framework for predicting 4acC-carrying regions from Arabidopsis genomic DNA sequences. Since the existing 4acC data are not precise for a specific base but only report regions that are hundreds of bases long, we formulated the task as a weakly supervised learning problem and built 4acCPred using a multi-instance-based deep neural network. Both cross-validation and independent testing on the four datasets under different conditions show promising performance, with mean areas under the receiver operating characteristic curve (AUCs) of 0.9877 and 0.9899, respectively. 4acCPred also provides motif mining through model interpretation. The motifs found by 4acCPred are consistent with existing knowledge, indicating that the model successfully captured real biological signals. In addition, a user-friendly web server was built to facilitate 4acC prediction, motif visualization, and data access. Our framework and web server should serve as useful tools for 4acC research.

Original language	English
Pages (from-to)	337-345
Number of pages	9
Journal	Molecular Therapy - Nucleic Acids
Volume	30
DOIs	https://doi.org/10.1016/j.omtn.2022.10.004
Publication status	Published - 13 Dec 2022

Keywords

DNA modification
MT: Bioinformatics
N-acetyldeoxycytosine
deep neural network
multiple-instance learning
sequence motif
weakly supervised learning

Access to Document

10.1016/j.omtn.2022.10.004

Cite this

@article{e1f48eadd20740cc9d429c1e57fcef92,

title = "4acCPred: Weakly supervised prediction of N4-acetyldeoxycytosine DNA modification from sequences",

abstract = "DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and highly associated with gene expression and actively transcribed genes. Precise identification of 4acC is essential for studying its biological function. We proposed the 4acCPred, the first computational framework for predicting 4acC-carrying regions from Arabidopsis genomic DNA sequences. Since the existing 4acC data are not precise for a specific base but only report regions that are hundreds of bases long, we formulated the task as a weakly supervised learning problem and built 4acCPred using a multi-instance-based deep neural network. Both cross-validation and independent testing on the four datasets under different conditions show promising performance, with mean areas under the receiver operating characteristic curve (AUCs) of 0.9877 and 0.9899, respectively. 4acCPred also provides motif mining through model interpretation. The motifs found by 4acCPred are consistent with existing knowledge, indicating that the model successfully captured real biological signals. In addition, a user-friendly web server was built to facilitate 4acC prediction, motif visualization, and data access. Our framework and web server should serve as useful tools for 4acC research.",

keywords = "DNA modification, MT: Bioinformatics, N-acetyldeoxycytosine, deep neural network, multiple-instance learning, sequence motif, weakly supervised learning",

author = "Jingxian Zhou and Xuan Wang and Zhen Wei and Jia Meng and Daiyun Huang",

note = "Publisher Copyright: {\textcopyright} 2022 The Authors",

year = "2022",

month = dec,

day = "13",

doi = "10.1016/j.omtn.2022.10.004",

language = "English",

volume = "30",

pages = "337--345",

journal = "Molecular Therapy - Nucleic Acids",

issn = "2162-2531",

}

TY - JOUR

T1 - 4acCPred

T2 - Weakly supervised prediction of N4-acetyldeoxycytosine DNA modification from sequences

AU - Zhou, Jingxian

AU - Wang, Xuan

AU - Wei, Zhen

AU - Meng, Jia

AU - Huang, Daiyun

PY - 2022/12/13

Y1 - 2022/12/13

N2 - DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and highly associated with gene expression and actively transcribed genes. Precise identification of 4acC is essential for studying its biological function. We proposed the 4acCPred, the first computational framework for predicting 4acC-carrying regions from Arabidopsis genomic DNA sequences. Since the existing 4acC data are not precise for a specific base but only report regions that are hundreds of bases long, we formulated the task as a weakly supervised learning problem and built 4acCPred using a multi-instance-based deep neural network. Both cross-validation and independent testing on the four datasets under different conditions show promising performance, with mean areas under the receiver operating characteristic curve (AUCs) of 0.9877 and 0.9899, respectively. 4acCPred also provides motif mining through model interpretation. The motifs found by 4acCPred are consistent with existing knowledge, indicating that the model successfully captured real biological signals. In addition, a user-friendly web server was built to facilitate 4acC prediction, motif visualization, and data access. Our framework and web server should serve as useful tools for 4acC research.

AB - DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and highly associated with gene expression and actively transcribed genes. Precise identification of 4acC is essential for studying its biological function. We proposed the 4acCPred, the first computational framework for predicting 4acC-carrying regions from Arabidopsis genomic DNA sequences. Since the existing 4acC data are not precise for a specific base but only report regions that are hundreds of bases long, we formulated the task as a weakly supervised learning problem and built 4acCPred using a multi-instance-based deep neural network. Both cross-validation and independent testing on the four datasets under different conditions show promising performance, with mean areas under the receiver operating characteristic curve (AUCs) of 0.9877 and 0.9899, respectively. 4acCPred also provides motif mining through model interpretation. The motifs found by 4acCPred are consistent with existing knowledge, indicating that the model successfully captured real biological signals. In addition, a user-friendly web server was built to facilitate 4acC prediction, motif visualization, and data access. Our framework and web server should serve as useful tools for 4acC research.

KW - DNA modification

KW - MT: Bioinformatics

KW - N-acetyldeoxycytosine

KW - deep neural network

KW - multiple-instance learning

KW - sequence motif

KW - weakly supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85140931721&partnerID=8YFLogxK

U2 - 10.1016/j.omtn.2022.10.004

DO - 10.1016/j.omtn.2022.10.004

M3 - Article

AN - SCOPUS:85140931721

SN - 2162-2531

VL - 30

SP - 337

EP - 345

JO - Molecular Therapy - Nucleic Acids

JF - Molecular Therapy - Nucleic Acids

ER -

4acCPred: Weakly supervised prediction of N⁴-acetyldeoxycytosine DNA modification from sequences

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this