TY - JOUR
T1 - Radio Galaxy Zoo
T2 - Leveraging latent space representations from variational autoencoder
AU - Andrianomena, Sambatra
AU - Tang, Hongming
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/6/1
Y1 - 2024/6/1
N2 - We propose to learn latent space representations of radio galaxies, and train a very deep variational autoencoder (VDVAE) on RGZ DR1, an unlabeled dataset, to this end. We show that the encoded features can be leveraged for downstream tasks such as classifying galaxies in labeled datasets, and similarity search. Results show that the model is able to reconstruct its given inputs, capturing the salient features of the latter. We use the latent codes of galaxy images, from MiraBest Confident and FR-DEEP NVSS datasets, to train various non-neural network classifiers. It is found that the latter can differentiate FRI from FRII galaxies achieving accuracy ≥ 76%, roc-auc ≥ 0.86, specificity ≥ 0.73 and recall ≥ 0.78 on MiraBest Confident dataset, comparable to results obtained in previous studies. The performance of simple classifiers trained on FR-DEEP NVSS data representations is on par with that of a deep learning classifier (CNN based) trained on images in previous work, highlighting how powerful the compressed information is. We successfully exploit the learned representations to search for galaxies in a dataset that are semantically similar to a query image belonging to a different dataset. Although generating new galaxy images (e.g. for data augmentation) is not our primary objective, we find that the VDVAE model is a relatively good emulator. Finally, as a step toward detecting anomaly/novelty, a density estimator — Masked Autoregressive Flow (MAF) — is trained on the latent codes, such that the log-likelihood of data can be estimated. The downstream tasks conducted in this work demonstrate the meaningfulness of the latent codes.
AB - We propose to learn latent space representations of radio galaxies, and train a very deep variational autoencoder (VDVAE) on RGZ DR1, an unlabeled dataset, to this end. We show that the encoded features can be leveraged for downstream tasks such as classifying galaxies in labeled datasets, and similarity search. Results show that the model is able to reconstruct its given inputs, capturing the salient features of the latter. We use the latent codes of galaxy images, from MiraBest Confident and FR-DEEP NVSS datasets, to train various non-neural network classifiers. It is found that the latter can differentiate FRI from FRII galaxies achieving accuracy ≥ 76%, roc-auc ≥ 0.86, specificity ≥ 0.73 and recall ≥ 0.78 on MiraBest Confident dataset, comparable to results obtained in previous studies. The performance of simple classifiers trained on FR-DEEP NVSS data representations is on par with that of a deep learning classifier (CNN based) trained on images in previous work, highlighting how powerful the compressed information is. We successfully exploit the learned representations to search for galaxies in a dataset that are semantically similar to a query image belonging to a different dataset. Although generating new galaxy images (e.g. for data augmentation) is not our primary objective, we find that the VDVAE model is a relatively good emulator. Finally, as a step toward detecting anomaly/novelty, a density estimator — Masked Autoregressive Flow (MAF) — is trained on the latent codes, such that the log-likelihood of data can be estimated. The downstream tasks conducted in this work demonstrate the meaningfulness of the latent codes.
KW - galaxy morphology
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85196709949&partnerID=8YFLogxK
U2 - 10.1088/1475-7516/2024/06/034
DO - 10.1088/1475-7516/2024/06/034
M3 - Article
AN - SCOPUS:85196709949
SN - 1475-7516
VL - 2024
JO - Journal of Cosmology and Astroparticle Physics
JF - Journal of Cosmology and Astroparticle Physics
IS - 6
M1 - 034
ER -