Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data

Ruben Ng; Paul Craig

doi:10.1117/12.3017922

Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data

Ruben Ng, Paul Craig^*

^*Corresponding author for this work

School of Advanced Technology

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Naïve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Naïve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.

Original language	English
Title of host publication	International Workshop on Signal Processing and Machine Learning, WSPML 2023
Editors	Yang Yue
Publisher	SPIE
ISBN (Electronic)	9781510671928
DOIs	https://doi.org/10.1117/12.3017922
Publication status	Published - 2023
Event	2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023 - Hangzhou, China Duration: 22 Sept 2023 → 24 Sept 2023

Publication series

Name	Proceedings of SPIE - The International Society for Optical Engineering
Volume	12943
ISSN (Print)	0277-786X
ISSN (Electronic)	1996-756X

Conference

Conference	2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023
Country/Territory	China
City	Hangzhou
Period	22/09/23 → 24/09/23

Keywords

bioinformatics
machine learning
supervised learning

Access to Document

10.1117/12.3017922

Cite this

Ng, R., & Craig, P. (2023). Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data. In Y. Yue (Ed.), International Workshop on Signal Processing and Machine Learning, WSPML 2023 Article 1294315 (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 12943). SPIE. https://doi.org/10.1117/12.3017922

@inproceedings{c2834d90c46048b9aed328723d716b58,

title = "Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data",

abstract = "A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Na{\"i}ve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Na{\"i}ve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.",

keywords = "bioinformatics, machine learning, supervised learning",

author = "Ruben Ng and Paul Craig",

note = "Publisher Copyright: {\textcopyright} 2023 SPIE. All rights reserved.; 2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023 ; Conference date: 22-09-2023 Through 24-09-2023",

year = "2023",

doi = "10.1117/12.3017922",

language = "English",

series = "Proceedings of SPIE - The International Society for Optical Engineering",

publisher = "SPIE",

editor = "Yang Yue",

booktitle = "International Workshop on Signal Processing and Machine Learning, WSPML 2023",

}

Ng, R & Craig, P 2023, Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data. in Y Yue (ed.), International Workshop on Signal Processing and Machine Learning, WSPML 2023., 1294315, Proceedings of SPIE - The International Society for Optical Engineering, vol. 12943, SPIE, 2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023, Hangzhou, China, 22/09/23. https://doi.org/10.1117/12.3017922

Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data. / Ng, Ruben; Craig, Paul.
International Workshop on Signal Processing and Machine Learning, WSPML 2023. ed. / Yang Yue. SPIE, 2023. 1294315 (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 12943).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data

AU - Ng, Ruben

AU - Craig, Paul

PY - 2023

Y1 - 2023

N2 - A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Naïve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Naïve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.

AB - A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Naïve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Naïve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.

KW - bioinformatics

KW - machine learning

KW - supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85180156286&partnerID=8YFLogxK

U2 - 10.1117/12.3017922

DO - 10.1117/12.3017922

M3 - Conference Proceeding

AN - SCOPUS:85180156286

T3 - Proceedings of SPIE - The International Society for Optical Engineering

BT - International Workshop on Signal Processing and Machine Learning, WSPML 2023

A2 - Yue, Yang

PB - SPIE

T2 - 2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023

Y2 - 22 September 2023 through 24 September 2023

ER -

Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this