RFC: A feature selection algorithm for software defect prediction

Xiaolong Xu, Wen Chen, Xinheng Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)

Abstract

Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predictdefects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defectdatasets, we propose Relief F-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into kclusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defectprediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.

Original languageEnglish
Article number9430113
Pages (from-to)389-398
Number of pages10
JournalJournal of Systems Engineering and Electronics
Volume32
Issue number2
DOIs
Publication statusPublished - Apr 2021
Externally publishedYes

Keywords

  • cluster
  • feature selection
  • software defect prediction (SDP)

Fingerprint

Dive into the research topics of 'RFC: A feature selection algorithm for software defect prediction'. Together they form a unique fingerprint.

Cite this