Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift

Inigo V. Slijepcevic*, Anna M.M. Scaife, Mike Walmsley, Micah Bowles, O. Ivy Wong, Stanislav S. Shabala, Hongming Tang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

26 Citations (Scopus)

Abstract

In this work, we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state of the art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularization and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data sets required for SSL, a significant drop in classification performance is observed, highlighting the difficulty of applying SSL techniques under data set shift.

Original languageEnglish
Pages (from-to)2599-2613
Number of pages15
JournalMonthly Notices of the Royal Astronomical Society
Volume514
Issue number2
DOIs
Publication statusPublished - 1 Aug 2022
Externally publishedYes

Keywords

  • Methods: data analysis
  • Methods: statistical
  • Radio continuum: galaxies

Fingerprint

Dive into the research topics of 'Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift'. Together they form a unique fingerprint.

Cite this