Super learner model for classifying leukemia through gene expression monitoring

Sharanya Selvaraj, Alhuseen Omar Alsayed, Nor Azman Ismail, Balasubramanian Prabhu Kavin, Edeh Michael Onyema*, Gan Hong Seng, Arinze Queen Uchechi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Leukemia is a form of cancer that affects the bone marrow and lymphatic system, and it requires complex treatment strategies that vary with each subtype. Due to the subtle morphological differences among these types, monitoring gene expressions is crucial for accurate classification. Manual or pathological testing can be time-consuming and expensive. Therefore, data-driven methods and machine learning algorithms offer an efficient alternative for leukemia classification. This study introduced a novel super learning model that leverages heterogeneous machine learning models to analyze gene expression data and classify leukemia cells. The proposed approach incorporates an entropy-based feature importance technique to identify the gene profiles most significant to the labeling process. The strength of this super learning model lies in its final super learner, Random Forest, which effectively classifies cross-validated data from the candidate learners. Validation on a gene expression monitoring dataset demonstrates that this model outperforms other state-of-the-art models in predictive accuracy. The study contributes to the knowledge regarding the use of advanced machine learning techniques to improve the accuracy and reliability of leukemia classification using gene expression data, addressing the challenges of traditional methods that rely on clinical features and morphological examination.

Original languageEnglish
Article number499
JournalDiscover Oncology
Volume15
Issue number1
DOIs
Publication statusPublished - Dec 2024

Keywords

  • DNA microarray
  • Gene expressions
  • Leukemia
  • Machine learning
  • Random forest
  • Super learner

Fingerprint

Dive into the research topics of 'Super learner model for classifying leukemia through gene expression monitoring'. Together they form a unique fingerprint.

Cite this