Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data

Ruben Ng, Paul Craig*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Naïve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Naïve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.

Original languageEnglish
Title of host publicationInternational Workshop on Signal Processing and Machine Learning, WSPML 2023
EditorsYang Yue
PublisherSPIE
ISBN (Electronic)9781510671928
DOIs
Publication statusPublished - 2023
Event2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023 - Hangzhou, China
Duration: 22 Sept 202324 Sept 2023

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume12943
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference2023 International Workshop on Signal Processing and Machine Learning, WSPML 2023
Country/TerritoryChina
CityHangzhou
Period22/09/2324/09/23

Keywords

  • bioinformatics
  • machine learning
  • supervised learning

Fingerprint

Dive into the research topics of 'Evaluating the Effectiveness of Supervised Learning Models for Antibiotic Pollution Detection from Biochip Data'. Together they form a unique fingerprint.

Cite this