TY - JOUR
T1 - Integration of Principal Component Analysis and K-Means Clustering for Type 2 Diabetes Sub-clustering Model
AU - Omar, Nashuha
AU - Wahab, Asnida Abdul
AU - Supriyanto, Eko
AU - Al-Ashwal, Rania Hussein
AU - Ramlee, Muhammad Hanif
AU - Seng, Gan Hong
N1 - Publisher Copyright:
© 2025 American Institute of Physics Inc.. All rights reserved.
PY - 2025/4/7
Y1 - 2025/4/7
N2 - Type 2 diabetes is a global disease issue and is one of leading causes of death. Current discovery indicates that this disease could be categorized into many sub-clusters, which is a step for precision medicine. In this paper, we aim to analyze and compare two approaches of data reduction, i.e. with and without principal component analysis (PCA) on the standardized and normalized data. Data preparation was first performed. The model was then developed and validated by plotting Elbow method and silhouette width graph. Normalized data with principal component (PC) of 2 gives the best clustering visualization, the lowest within cluster sum of squared (WCSS) score (195.41) and highest Silhouette score (0.3491) compared to using both standardized data and standardized data (PC=2) with 23518.82 (WCSS score) and 0.1976 (Silhouette score). We concluded that by integrating PCA with k-means clustering, the score value of WCSS shown to be lower while higher value recorded for Silhouette score.
AB - Type 2 diabetes is a global disease issue and is one of leading causes of death. Current discovery indicates that this disease could be categorized into many sub-clusters, which is a step for precision medicine. In this paper, we aim to analyze and compare two approaches of data reduction, i.e. with and without principal component analysis (PCA) on the standardized and normalized data. Data preparation was first performed. The model was then developed and validated by plotting Elbow method and silhouette width graph. Normalized data with principal component (PC) of 2 gives the best clustering visualization, the lowest within cluster sum of squared (WCSS) score (195.41) and highest Silhouette score (0.3491) compared to using both standardized data and standardized data (PC=2) with 23518.82 (WCSS score) and 0.1976 (Silhouette score). We concluded that by integrating PCA with k-means clustering, the score value of WCSS shown to be lower while higher value recorded for Silhouette score.
UR - http://www.scopus.com/inward/record.url?scp=105003297221&partnerID=8YFLogxK
U2 - 10.1063/5.0209965
DO - 10.1063/5.0209965
M3 - Conference article
AN - SCOPUS:105003297221
SN - 0094-243X
VL - 3056
JO - AIP Conference Proceedings
JF - AIP Conference Proceedings
IS - 1
M1 - 070001
T2 - 2022 Sustainable and Integrated Engineering International Conference, SIE 2022
Y2 - 12 December 2022 through 13 December 2022
ER -