TY - JOUR
T1 - FG-HFS
T2 - A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data
AU - Xu, Zhaozhao
AU - Yang, Fangyuan
AU - Tang, Chaosheng
AU - Wang, Hong
AU - Wang, Shuihua
AU - Sun, Junding
AU - Zhang, Yudong
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2024/7/1
Y1 - 2024/7/1
N2 - High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG-HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.
AB - High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG-HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.
KW - Feature selection
KW - Gene expression data
KW - Multi-objective genetic algorithm
KW - Spectral clustering
KW - Symmetric uncertainty
UR - http://www.scopus.com/inward/record.url?scp=85182878446&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.123069
DO - 10.1016/j.eswa.2023.123069
M3 - Article
AN - SCOPUS:85182878446
SN - 0957-4174
VL - 245
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 123069
ER -