Skip to main navigation Skip to search Skip to main content

Handling mislabeled data in fault diagnosis: A graph-assisted random forest approach

  • Shaozhi Chen
  • , Xiaopeng Xi*
  • , Maiying Zhong
  • , Rui Yang
  • , Marcos E. Orchard
  • *Corresponding author for this work
  • Shandong University of Science and Technology
  • Universidad Técnica Federico Santa Maria
  • Universidad de Chile

Research output: Contribution to journalArticlepeer-review

Abstract

Label information plays a critical role in both supervised and semi-supervised learning-based fault diagnosis methods. However, mislabeled data can significantly degrade the classification performance of the resulting fault diagnosis model. To address this challenge, a graph-assisted random forest (GARF) approach is proposed in this paper, aiming to mitigate the adverse effects of mislabeled data in fault diagnosis. The core of this approach is a spectral clustering matching (SCM)-based method for identifying incorrect labels, leveraging the independence of the graph structure from sample labels. Identified mislabeled samples are subsequently stripped of their incorrect labels and treated as unlabeled data. Subsequently, a graph-based semi-supervised learning (GSSL) algorithm is employed to infer corrected labels for these samples, using the underlying graph topology to enable effective label correction. Following this, a random forest (RF) classifier is trained on the rectified dataset to establish the GARF-based fault diagnosis model, facilitating real-time fault diagnosis. The proposed method is validated using monitoring data from a hardware-in-the-loop high-speed train simulation platform. Experimental results show that the GARF method outperforms multiple existing approaches across key metrics, including accuracy, recall, F1-score, and computational efficiency.

Original languageEnglish
Article number132669
JournalNeurocomputing
Volume671
DOIs
Publication statusPublished - 28 Mar 2026

Keywords

  • Fault diagnosis
  • Graph-assisted random forest
  • Label correction
  • Mislabeled data
  • Semi-supervised learning

Cite this