TY - JOUR
T1 - Radio Galaxy Zoo
T2 - using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift
AU - Slijepcevic, Inigo V.
AU - Scaife, Anna M.M.
AU - Walmsley, Mike
AU - Bowles, Micah
AU - Ivy Wong, O.
AU - Shabala, Stanislav S.
AU - Tang, Hongming
N1 - Publisher Copyright:
©chet distance between labelled and unlabelled data sets as a measure of data set shift can provide a prediction of model performance, but that for typical radio galaxy data sets with labelled sample volumes of O(103), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.
PY - 2022/8/1
Y1 - 2022/8/1
N2 - In this work, we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state of the art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularization and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data sets required for SSL, a significant drop in classification performance is observed, highlighting the difficulty of applying SSL techniques under data set shift.
AB - In this work, we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state of the art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularization and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data sets required for SSL, a significant drop in classification performance is observed, highlighting the difficulty of applying SSL techniques under data set shift.
KW - Methods: data analysis
KW - Methods: statistical
KW - Radio continuum: galaxies
UR - http://www.scopus.com/inward/record.url?scp=85133649750&partnerID=8YFLogxK
U2 - 10.1093/mnras/stac1135
DO - 10.1093/mnras/stac1135
M3 - Article
AN - SCOPUS:85133649750
SN - 0035-8711
VL - 514
SP - 2599
EP - 2613
JO - Monthly Notices of the Royal Astronomical Society
JF - Monthly Notices of the Royal Astronomical Society
IS - 2
ER -