TY - JOUR
T1 - RFC
T2 - A feature selection algorithm for software defect prediction
AU - Xu, Xiaolong
AU - Chen, Wen
AU - Wang, Xinheng
N1 - Publisher Copyright:
© 1990-2011 Beijing Institute of Aerospace Information.
PY - 2021/4
Y1 - 2021/4
N2 - Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predictdefects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defectdatasets, we propose Relief F-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into kclusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defectprediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.
AB - Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predictdefects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defectdatasets, we propose Relief F-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into kclusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defectprediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.
KW - cluster
KW - feature selection
KW - software defect prediction (SDP)
UR - http://www.scopus.com/inward/record.url?scp=85105849853&partnerID=8YFLogxK
U2 - 10.23919/JSEE.2021.000032
DO - 10.23919/JSEE.2021.000032
M3 - Article
AN - SCOPUS:85105849853
SN - 1671-1793
VL - 32
SP - 389
EP - 398
JO - Journal of Systems Engineering and Electronics
JF - Journal of Systems Engineering and Electronics
IS - 2
M1 - 9430113
ER -