FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data

Zhaozhao Xu, Fangyuan Yang, Chaosheng Tang, Hong Wang, Shuihua Wang, Junding Sun, Yudong Zhang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG-HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.

Original languageEnglish
Article number123069
JournalExpert Systems with Applications
Volume245
DOIs
Publication statusPublished - 1 Jul 2024

Keywords

  • Feature selection
  • Gene expression data
  • Multi-objective genetic algorithm
  • Spectral clustering
  • Symmetric uncertainty

Fingerprint

Dive into the research topics of 'FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data'. Together they form a unique fingerprint.

Cite this