Integration of Principal Component Analysis and K-Means Clustering for Type 2 Diabetes Sub-clustering Model

Nashuha Omar, Asnida Abdul Wahab*, Eko Supriyanto, Rania Hussein Al-Ashwal, Muhammad Hanif Ramlee, Gan Hong Seng

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Type 2 diabetes is a global disease issue and is one of leading causes of death. Current discovery indicates that this disease could be categorized into many sub-clusters, which is a step for precision medicine. In this paper, we aim to analyze and compare two approaches of data reduction, i.e. with and without principal component analysis (PCA) on the standardized and normalized data. Data preparation was first performed. The model was then developed and validated by plotting Elbow method and silhouette width graph. Normalized data with principal component (PC) of 2 gives the best clustering visualization, the lowest within cluster sum of squared (WCSS) score (195.41) and highest Silhouette score (0.3491) compared to using both standardized data and standardized data (PC=2) with 23518.82 (WCSS score) and 0.1976 (Silhouette score). We concluded that by integrating PCA with k-means clustering, the score value of WCSS shown to be lower while higher value recorded for Silhouette score.

Original languageEnglish
Article number070001
JournalAIP Conference Proceedings
Volume3056
Issue number1
DOIs
Publication statusPublished - 7 Apr 2025
Event2022 Sustainable and Integrated Engineering International Conference, SIE 2022 - Langkawi Island, Malaysia
Duration: 12 Dec 202213 Dec 2022

Fingerprint

Dive into the research topics of 'Integration of Principal Component Analysis and K-Means Clustering for Type 2 Diabetes Sub-clustering Model'. Together they form a unique fingerprint.

Cite this