Radio Galaxy Zoo: Leveraging latent space representations from variational autoencoder

Sambatra Andrianomena; Hongming Tang

doi:10.1088/1475-7516/2024/06/034

Radio Galaxy Zoo: Leveraging latent space representations from variational autoencoder

Sambatra Andrianomena^*, Hongming Tang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

We propose to learn latent space representations of radio galaxies, and train a very deep variational autoencoder (VDVAE) on RGZ DR1, an unlabeled dataset, to this end. We show that the encoded features can be leveraged for downstream tasks such as classifying galaxies in labeled datasets, and similarity search. Results show that the model is able to reconstruct its given inputs, capturing the salient features of the latter. We use the latent codes of galaxy images, from MiraBest Confident and FR-DEEP NVSS datasets, to train various non-neural network classifiers. It is found that the latter can differentiate FRI from FRII galaxies achieving accuracy ≥ 76%, roc-auc ≥ 0.86, specificity ≥ 0.73 and recall ≥ 0.78 on MiraBest Confident dataset, comparable to results obtained in previous studies. The performance of simple classifiers trained on FR-DEEP NVSS data representations is on par with that of a deep learning classifier (CNN based) trained on images in previous work, highlighting how powerful the compressed information is. We successfully exploit the learned representations to search for galaxies in a dataset that are semantically similar to a query image belonging to a different dataset. Although generating new galaxy images (e.g. for data augmentation) is not our primary objective, we find that the VDVAE model is a relatively good emulator. Finally, as a step toward detecting anomaly/novelty, a density estimator — Masked Autoregressive Flow (MAF) — is trained on the latent codes, such that the log-likelihood of data can be estimated. The downstream tasks conducted in this work demonstrate the meaningfulness of the latent codes.

Original language	English
Article number	034
Journal	Journal of Cosmology and Astroparticle Physics
Volume	2024
Issue number	6
DOIs	https://doi.org/10.1088/1475-7516/2024/06/034
Publication status	Published - 1 Jun 2024
Externally published	Yes

Keywords

galaxy morphology
Machine learning

Access to Document

10.1088/1475-7516/2024/06/034

Cite this

@article{3cec79ae437a44d69727600202e86ea2,

title = "Radio Galaxy Zoo: Leveraging latent space representations from variational autoencoder",

abstract = "We propose to learn latent space representations of radio galaxies, and train a very deep variational autoencoder (VDVAE) on RGZ DR1, an unlabeled dataset, to this end. We show that the encoded features can be leveraged for downstream tasks such as classifying galaxies in labeled datasets, and similarity search. Results show that the model is able to reconstruct its given inputs, capturing the salient features of the latter. We use the latent codes of galaxy images, from MiraBest Confident and FR-DEEP NVSS datasets, to train various non-neural network classifiers. It is found that the latter can differentiate FRI from FRII galaxies achieving accuracy ≥ 76%, roc-auc ≥ 0.86, specificity ≥ 0.73 and recall ≥ 0.78 on MiraBest Confident dataset, comparable to results obtained in previous studies. The performance of simple classifiers trained on FR-DEEP NVSS data representations is on par with that of a deep learning classifier (CNN based) trained on images in previous work, highlighting how powerful the compressed information is. We successfully exploit the learned representations to search for galaxies in a dataset that are semantically similar to a query image belonging to a different dataset. Although generating new galaxy images (e.g. for data augmentation) is not our primary objective, we find that the VDVAE model is a relatively good emulator. Finally, as a step toward detecting anomaly/novelty, a density estimator — Masked Autoregressive Flow (MAF) — is trained on the latent codes, such that the log-likelihood of data can be estimated. The downstream tasks conducted in this work demonstrate the meaningfulness of the latent codes.",

keywords = "galaxy morphology, Machine learning",

author = "Sambatra Andrianomena and Hongming Tang",

note = "Publisher Copyright: {\textcopyright} 2024 The Author(s)",

year = "2024",

month = jun,

day = "1",

doi = "10.1088/1475-7516/2024/06/034",

language = "English",

volume = "2024",

journal = "Journal of Cosmology and Astroparticle Physics",

issn = "1475-7516",

number = "6",

}

TY - JOUR

T1 - Radio Galaxy Zoo

T2 - Leveraging latent space representations from variational autoencoder

AU - Andrianomena, Sambatra

AU - Tang, Hongming

PY - 2024/6/1

Y1 - 2024/6/1

N2 - We propose to learn latent space representations of radio galaxies, and train a very deep variational autoencoder (VDVAE) on RGZ DR1, an unlabeled dataset, to this end. We show that the encoded features can be leveraged for downstream tasks such as classifying galaxies in labeled datasets, and similarity search. Results show that the model is able to reconstruct its given inputs, capturing the salient features of the latter. We use the latent codes of galaxy images, from MiraBest Confident and FR-DEEP NVSS datasets, to train various non-neural network classifiers. It is found that the latter can differentiate FRI from FRII galaxies achieving accuracy ≥ 76%, roc-auc ≥ 0.86, specificity ≥ 0.73 and recall ≥ 0.78 on MiraBest Confident dataset, comparable to results obtained in previous studies. The performance of simple classifiers trained on FR-DEEP NVSS data representations is on par with that of a deep learning classifier (CNN based) trained on images in previous work, highlighting how powerful the compressed information is. We successfully exploit the learned representations to search for galaxies in a dataset that are semantically similar to a query image belonging to a different dataset. Although generating new galaxy images (e.g. for data augmentation) is not our primary objective, we find that the VDVAE model is a relatively good emulator. Finally, as a step toward detecting anomaly/novelty, a density estimator — Masked Autoregressive Flow (MAF) — is trained on the latent codes, such that the log-likelihood of data can be estimated. The downstream tasks conducted in this work demonstrate the meaningfulness of the latent codes.

AB - We propose to learn latent space representations of radio galaxies, and train a very deep variational autoencoder (VDVAE) on RGZ DR1, an unlabeled dataset, to this end. We show that the encoded features can be leveraged for downstream tasks such as classifying galaxies in labeled datasets, and similarity search. Results show that the model is able to reconstruct its given inputs, capturing the salient features of the latter. We use the latent codes of galaxy images, from MiraBest Confident and FR-DEEP NVSS datasets, to train various non-neural network classifiers. It is found that the latter can differentiate FRI from FRII galaxies achieving accuracy ≥ 76%, roc-auc ≥ 0.86, specificity ≥ 0.73 and recall ≥ 0.78 on MiraBest Confident dataset, comparable to results obtained in previous studies. The performance of simple classifiers trained on FR-DEEP NVSS data representations is on par with that of a deep learning classifier (CNN based) trained on images in previous work, highlighting how powerful the compressed information is. We successfully exploit the learned representations to search for galaxies in a dataset that are semantically similar to a query image belonging to a different dataset. Although generating new galaxy images (e.g. for data augmentation) is not our primary objective, we find that the VDVAE model is a relatively good emulator. Finally, as a step toward detecting anomaly/novelty, a density estimator — Masked Autoregressive Flow (MAF) — is trained on the latent codes, such that the log-likelihood of data can be estimated. The downstream tasks conducted in this work demonstrate the meaningfulness of the latent codes.

KW - galaxy morphology

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85196709949&partnerID=8YFLogxK

U2 - 10.1088/1475-7516/2024/06/034

DO - 10.1088/1475-7516/2024/06/034

M3 - Article

AN - SCOPUS:85196709949

SN - 1475-7516

VL - 2024

JO - Journal of Cosmology and Astroparticle Physics

JF - Journal of Cosmology and Astroparticle Physics

IS - 6

M1 - 034

ER -

Radio Galaxy Zoo: Leveraging latent space representations from variational autoencoder

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this