Improving Non-Negative Positive-Unlabeled Learning for News Headline Classification

Zhanlin Ji, Chengyuan Du, Jiawen Jiang, Li Zhao, Haiyang Zhang*, Ivan Ganchev

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

With the development of Internet technology, network platforms have gradually become a tool for people to obtain hot news. How to filter out the current hot news from a large number of news collections and push them to users has important application value. In supervised learning scenarios, each piece of news needs to be labeled manually, which takes a lot of time and effort. From the perspective of semi-supervised learning, on the basis of the non-negative Positive-Unlabeled (nnPU) learning, this paper proposes a novel algorithm, called 'Enhanced nnPU with Focal Loss' (FLPU), for news headline classification, which uses the Focal Loss to replace the way the classical nnPU calculates the empirical risk of positive and negative samples. Then, by introducing the Virtual Adversarial Training (VAT) of the Adversarial training for large neural LangUage Models (ALUM) into FLPU, another (and better) algorithm, called 'FLPU+ALUM', is proposed for the same purpose, aiming to label only a small number of positive samples. The superiority of both algorithms to the state-of-the-art PU algorithms considered is demonstrated by means of experiments, conducted on two datasets for performance comparison. Moreover, through another set of experiments, it is shown that, if enriched by the proposed algorithms, the RoBERTa-wwm-ext model can achieve better classification performance than the state-of-the-art binary classification models included in the comparison. In addition, a 'Ratio Batch' method is elaborated and proposed as more stable for use in scenarios involving only a small number of labeled positive samples, which is also experimentally demonstrated.

Original languageEnglish
Pages (from-to)40192-40203
Number of pages12
JournalIEEE Access
Volume11
DOIs
Publication statusPublished - 2023

Keywords

  • Text classification
  • adversarial training for large neural language models (ALUM)
  • focal loss
  • non-negative positive-unlabeled (nnPU) learning
  • virtual adversarial training (VAT)

Fingerprint

Dive into the research topics of 'Improving Non-Negative Positive-Unlabeled Learning for News Headline Classification'. Together they form a unique fingerprint.

Cite this