TY - GEN
T1 - Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data
AU - Ng, Ruben
AU - Craig, Paul
N1 - Publisher Copyright:
© 2023 SPIE. All rights reserved.
PY - 2023
Y1 - 2023
N2 - A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Naïve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Naïve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.
AB - A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Naïve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Naïve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.
KW - bioinformatics
KW - machine learning
KW - supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85180156286&partnerID=8YFLogxK
U2 - 10.1117/12.3017922
DO - 10.1117/12.3017922
M3 - Conference Proceeding
AN - SCOPUS:85180156286
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - International Workshop on Signal Processing and Machine Learning, WSPML 2023
A2 - Yue, Yang
PB - SPIE
T2 - 2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023
Y2 - 22 September 2023 through 24 September 2023
ER -