Differentiating amino acids from nanopore sequencing

Jiahao Zhang; Jia Meng; Yuxin Zhang

doi:10.1145/3674658.3674663

Differentiating amino acids from nanopore sequencing

Jiahao Zhang, Jia Meng, Yuxin Zhang

Department of Biosciences and Bioinformatics

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Amino acids nanopore sequencing is a significant breakthrough in the fields of molecular biology, biochemistry, and medical diagnostics.The tool's high sensitivity, specificity, and real-time analytic capability make it essential for accurately identifying amino acids.A new nanopore, known as Msp-NTA-Ni, has recently advanced the bounds by allowing accurate differentiation of all 20 proteinogenic amino acids and their post-translational modifications (PTMs).Utilizing the data produced by this nanopore, our research conducted a thorough examination of five features, pinpointing the most useful pairs for the purpose of classification.Subsequently, we undertake an elaborate process that encompasses the training, fine-tuning, and comparative evaluation of multiple machine learning models, such as Random Forest, CatBoost, and SVM.The results of our research indicate that the Random Forest model surpasses the current benchmarks, obtaining a validation accuracy of 99.04%.Moreover, our research emphasizes the crucial significance of particular combinations of features, such as the mean and standard deviation, in improving the performance of the model, despite some limitations in differentiating between certain pairs of amino acids.

Original language	English
Title of host publication	ICBBT 2024 - Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology
Publisher	Association for Computing Machinery
Pages	25-30
Number of pages	6
ISBN (Electronic)	9798400717666
DOIs	https://doi.org/10.1145/3674658.3674663
Publication status	Published - 18 Nov 2024
Event	16th International Conference on Bioinformatics and Biomedical Technology, ICBBT 2024 - Chongqing, China Duration: 24 May 2024 → 26 May 2024

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	16th International Conference on Bioinformatics and Biomedical Technology, ICBBT 2024
Country/Territory	China
City	Chongqing
Period	24/05/24 → 26/05/24

Access to Document

10.1145/3674658.3674663

Cite this

@inproceedings{f9cc7eabdb854131b2f3dda84500d2f0,

title = "Differentiating amino acids from nanopore sequencing",

abstract = "Amino acids nanopore sequencing is a significant breakthrough in the fields of molecular biology, biochemistry, and medical diagnostics.The tool's high sensitivity, specificity, and real-time analytic capability make it essential for accurately identifying amino acids.A new nanopore, known as Msp-NTA-Ni, has recently advanced the bounds by allowing accurate differentiation of all 20 proteinogenic amino acids and their post-translational modifications (PTMs).Utilizing the data produced by this nanopore, our research conducted a thorough examination of five features, pinpointing the most useful pairs for the purpose of classification.Subsequently, we undertake an elaborate process that encompasses the training, fine-tuning, and comparative evaluation of multiple machine learning models, such as Random Forest, CatBoost, and SVM.The results of our research indicate that the Random Forest model surpasses the current benchmarks, obtaining a validation accuracy of 99.04%.Moreover, our research emphasizes the crucial significance of particular combinations of features, such as the mean and standard deviation, in improving the performance of the model, despite some limitations in differentiating between certain pairs of amino acids.",

author = "Jiahao Zhang and Jia Meng and Yuxin Zhang",

note = "Publisher Copyright: {\textcopyright} 2024 Copyright held by the owner/author(s).; 16th International Conference on Bioinformatics and Biomedical Technology, ICBBT 2024 ; Conference date: 24-05-2024 Through 26-05-2024",

year = "2024",

month = nov,

day = "18",

doi = "10.1145/3674658.3674663",

language = "English",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "25--30",

booktitle = "ICBBT 2024 - Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology",

}

Zhang, J, Meng, J & Zhang, Y 2024, Differentiating amino acids from nanopore sequencing. in ICBBT 2024 - Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 25-30, 16th International Conference on Bioinformatics and Biomedical Technology, ICBBT 2024, Chongqing, China, 24/05/24. https://doi.org/10.1145/3674658.3674663

Differentiating amino acids from nanopore sequencing. / Zhang, Jiahao; Meng, Jia; Zhang, Yuxin.
ICBBT 2024 - Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology. Association for Computing Machinery, 2024. p. 25-30 (ACM International Conference Proceeding Series).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Differentiating amino acids from nanopore sequencing

AU - Zhang, Jiahao

AU - Meng, Jia

AU - Zhang, Yuxin

PY - 2024/11/18

Y1 - 2024/11/18

N2 - Amino acids nanopore sequencing is a significant breakthrough in the fields of molecular biology, biochemistry, and medical diagnostics.The tool's high sensitivity, specificity, and real-time analytic capability make it essential for accurately identifying amino acids.A new nanopore, known as Msp-NTA-Ni, has recently advanced the bounds by allowing accurate differentiation of all 20 proteinogenic amino acids and their post-translational modifications (PTMs).Utilizing the data produced by this nanopore, our research conducted a thorough examination of five features, pinpointing the most useful pairs for the purpose of classification.Subsequently, we undertake an elaborate process that encompasses the training, fine-tuning, and comparative evaluation of multiple machine learning models, such as Random Forest, CatBoost, and SVM.The results of our research indicate that the Random Forest model surpasses the current benchmarks, obtaining a validation accuracy of 99.04%.Moreover, our research emphasizes the crucial significance of particular combinations of features, such as the mean and standard deviation, in improving the performance of the model, despite some limitations in differentiating between certain pairs of amino acids.

AB - Amino acids nanopore sequencing is a significant breakthrough in the fields of molecular biology, biochemistry, and medical diagnostics.The tool's high sensitivity, specificity, and real-time analytic capability make it essential for accurately identifying amino acids.A new nanopore, known as Msp-NTA-Ni, has recently advanced the bounds by allowing accurate differentiation of all 20 proteinogenic amino acids and their post-translational modifications (PTMs).Utilizing the data produced by this nanopore, our research conducted a thorough examination of five features, pinpointing the most useful pairs for the purpose of classification.Subsequently, we undertake an elaborate process that encompasses the training, fine-tuning, and comparative evaluation of multiple machine learning models, such as Random Forest, CatBoost, and SVM.The results of our research indicate that the Random Forest model surpasses the current benchmarks, obtaining a validation accuracy of 99.04%.Moreover, our research emphasizes the crucial significance of particular combinations of features, such as the mean and standard deviation, in improving the performance of the model, despite some limitations in differentiating between certain pairs of amino acids.

UR - http://www.scopus.com/inward/record.url?scp=85212871239&partnerID=8YFLogxK

U2 - 10.1145/3674658.3674663

DO - 10.1145/3674658.3674663

M3 - Conference Proceeding

AN - SCOPUS:85212871239

T3 - ACM International Conference Proceeding Series

SP - 25

EP - 30

BT - ICBBT 2024 - Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology

PB - Association for Computing Machinery

T2 - 16th International Conference on Bioinformatics and Biomedical Technology, ICBBT 2024

Y2 - 24 May 2024 through 26 May 2024

ER -

Differentiating amino acids from nanopore sequencing

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this