Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift

Inigo V. Slijepcevic; Anna M.M. Scaife; Mike Walmsley; Micah Bowles; O. Ivy Wong; Stanislav S. Shabala; Hongming Tang

doi:10.1093/mnras/stac1135

Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift

Inigo V. Slijepcevic^*, Anna M.M. Scaife, Mike Walmsley, Micah Bowles, O. Ivy Wong, Stanislav S. Shabala, Hongming Tang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

27 Citations (Scopus)

Abstract

In this work, we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state of the art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularization and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data sets required for SSL, a significant drop in classification performance is observed, highlighting the difficulty of applying SSL techniques under data set shift.

Original language	English
Pages (from-to)	2599-2613
Number of pages	15
Journal	Monthly Notices of the Royal Astronomical Society
Volume	514
Issue number	2
DOIs	https://doi.org/10.1093/mnras/stac1135
Publication status	Published - 1 Aug 2022
Externally published	Yes

Keywords

Methods: data analysis
Methods: statistical
Radio continuum: galaxies

Access to Document

10.1093/mnras/stac1135

Cite this

@article{34cf2ac5cc504c4793e009aebc1625cb,

title = "Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift",

abstract = "In this work, we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state of the art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularization and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data sets required for SSL, a significant drop in classification performance is observed, highlighting the difficulty of applying SSL techniques under data set shift.",

keywords = "Methods: data analysis, Methods: statistical, Radio continuum: galaxies",

author = "Slijepcevic, {Inigo V.} and Scaife, {Anna M.M.} and Mike Walmsley and Micah Bowles and {Ivy Wong}, O. and Shabala, {Stanislav S.} and Hongming Tang",

note = "Publisher Copyright: {\textcopyright}chet distance between labelled and unlabelled data sets as a measure of data set shift can provide a prediction of model performance, but that for typical radio galaxy data sets with labelled sample volumes of O(103), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle. ",

year = "2022",

month = aug,

day = "1",

doi = "10.1093/mnras/stac1135",

language = "English",

volume = "514",

pages = "2599--2613",

journal = "Monthly Notices of the Royal Astronomical Society",

issn = "0035-8711",

number = "2",

}

Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift. / Slijepcevic, Inigo V.; Scaife, Anna M.M.; Walmsley, Mike et al.
In: Monthly Notices of the Royal Astronomical Society, Vol. 514, No. 2, 01.08.2022, p. 2599-2613.

Research output: Contribution to journal › Article › peer-review

TY - JOUR

T1 - Radio Galaxy Zoo

T2 - using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift

AU - Slijepcevic, Inigo V.

AU - Scaife, Anna M.M.

AU - Walmsley, Mike

AU - Bowles, Micah

AU - Ivy Wong, O.

AU - Shabala, Stanislav S.

AU - Tang, Hongming

N1 - Publisher Copyright: ©chet distance between labelled and unlabelled data sets as a measure of data set shift can provide a prediction of model performance, but that for typical radio galaxy data sets with labelled sample volumes of O(103), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.

PY - 2022/8/1

Y1 - 2022/8/1

N2 - In this work, we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state of the art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularization and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data sets required for SSL, a significant drop in classification performance is observed, highlighting the difficulty of applying SSL techniques under data set shift.

AB - In this work, we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state of the art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularization and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data sets required for SSL, a significant drop in classification performance is observed, highlighting the difficulty of applying SSL techniques under data set shift.

KW - Methods: data analysis

KW - Methods: statistical

KW - Radio continuum: galaxies

UR - http://www.scopus.com/inward/record.url?scp=85133649750&partnerID=8YFLogxK

U2 - 10.1093/mnras/stac1135

DO - 10.1093/mnras/stac1135

M3 - Article

AN - SCOPUS:85133649750

SN - 0035-8711

VL - 514

SP - 2599

EP - 2613

JO - Monthly Notices of the Royal Astronomical Society

JF - Monthly Notices of the Royal Astronomical Society

IS - 2

ER -

Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this