Pre-Trained or Adversarial Training: A Comparison of NER Methods on Chinese Drug Specifications

Kok Hoe Wong; ZhuJia SHENG

doi:10.1109/ICSCC62041.2024.10690832

Pre-Trained or Adversarial Training: A Comparison of NER Methods on Chinese Drug Specifications

Kok Hoe Wong^*, ZhuJia SHENG

^*Corresponding author for this work

Department of Computing

Xi'an Jiaotong‐Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Named Entity Recognition (NER) is widely used for Natural Language Processing (NLP) but most of the current work focus on analyzing English-based text. This paper compares different NER models in extracting key contents from Chinese drug specifications. These key contents help identify important information about the drugs to the users. Three models were initially chosen for this research, namely BiLSTM-CRF, MiniRBT-BiLSTM-CRF, and MiniRBT-CRF. Experimental results show that MiniRBT-CRF outperforms the other two models, achieving high precision and F1 scores. We then worked on optimizing this model with word embedding and adversarial training. Firstly, we replaced MiniRBT with BERT-Base-Chinese model and the results show that the BERT-CRF has a 2% growth in F1 scores over the MiniRBT-CRF. Next, we augmented adversarial training to the BERT-CRF model. However, the results show that BERT-CRFAdv only increased the precision score by 1%, but not the F1 score. The results thus suggest that in order to enhance an NER model for Chinese-based text, optimizing the underlying model is the better choice.

Original language	English
Title of host publication	IEEE Xplore
Publisher	IEEE
Pages	70-75
ISBN (Electronic)	979-8-3503-6310-4
ISBN (Print)	979-8-3503-6311-1
DOIs	https://doi.org/10.1109/ICSCC62041.2024.10690832
Publication status	Published - 1 Oct 2024

Access to Document

10.1109/ICSCC62041.2024.10690832

https://ieeexplore.ieee.org/document/10690832

Cite this

@inproceedings{259e00cbe4b1493593a45877dd32ddea,

title = "Pre-Trained or Adversarial Training: A Comparison of NER Methods on Chinese Drug Specifications",

abstract = "Named Entity Recognition (NER) is widely used for Natural Language Processing (NLP) but most of the current work focus on analyzing English-based text. This paper compares different NER models in extracting key contents from Chinese drug specifications. These key contents help identify important information about the drugs to the users. Three models were initially chosen for this research, namely BiLSTM-CRF, MiniRBT-BiLSTM-CRF, and MiniRBT-CRF. Experimental results show that MiniRBT-CRF outperforms the other two models, achieving high precision and F1 scores. We then worked on optimizing this model with word embedding and adversarial training. Firstly, we replaced MiniRBT with BERT-Base-Chinese model and the results show that the BERT-CRF has a 2% growth in F1 scores over the MiniRBT-CRF. Next, we augmented adversarial training to the BERT-CRF model. However, the results show that BERT-CRFAdv only increased the precision score by 1%, but not the F1 score. The results thus suggest that in order to enhance an NER model for Chinese-based text, optimizing the underlying model is the better choice.",

author = "Wong, {Kok Hoe} and ZhuJia SHENG",

year = "2024",

month = oct,

day = "1",

doi = "10.1109/ICSCC62041.2024.10690832",

language = "English",

isbn = "979-8-3503-6311-1",

pages = "70--75",

booktitle = "IEEE Xplore",

publisher = "IEEE",

}

TY - GEN

T1 - Pre-Trained or Adversarial Training: A Comparison of NER Methods on Chinese Drug Specifications

AU - Wong, Kok Hoe

AU - SHENG, ZhuJia

PY - 2024/10/1

Y1 - 2024/10/1

N2 - Named Entity Recognition (NER) is widely used for Natural Language Processing (NLP) but most of the current work focus on analyzing English-based text. This paper compares different NER models in extracting key contents from Chinese drug specifications. These key contents help identify important information about the drugs to the users. Three models were initially chosen for this research, namely BiLSTM-CRF, MiniRBT-BiLSTM-CRF, and MiniRBT-CRF. Experimental results show that MiniRBT-CRF outperforms the other two models, achieving high precision and F1 scores. We then worked on optimizing this model with word embedding and adversarial training. Firstly, we replaced MiniRBT with BERT-Base-Chinese model and the results show that the BERT-CRF has a 2% growth in F1 scores over the MiniRBT-CRF. Next, we augmented adversarial training to the BERT-CRF model. However, the results show that BERT-CRFAdv only increased the precision score by 1%, but not the F1 score. The results thus suggest that in order to enhance an NER model for Chinese-based text, optimizing the underlying model is the better choice.

AB - Named Entity Recognition (NER) is widely used for Natural Language Processing (NLP) but most of the current work focus on analyzing English-based text. This paper compares different NER models in extracting key contents from Chinese drug specifications. These key contents help identify important information about the drugs to the users. Three models were initially chosen for this research, namely BiLSTM-CRF, MiniRBT-BiLSTM-CRF, and MiniRBT-CRF. Experimental results show that MiniRBT-CRF outperforms the other two models, achieving high precision and F1 scores. We then worked on optimizing this model with word embedding and adversarial training. Firstly, we replaced MiniRBT with BERT-Base-Chinese model and the results show that the BERT-CRF has a 2% growth in F1 scores over the MiniRBT-CRF. Next, we augmented adversarial training to the BERT-CRF model. However, the results show that BERT-CRFAdv only increased the precision score by 1%, but not the F1 score. The results thus suggest that in order to enhance an NER model for Chinese-based text, optimizing the underlying model is the better choice.

U2 - 10.1109/ICSCC62041.2024.10690832

DO - 10.1109/ICSCC62041.2024.10690832

M3 - Conference Proceeding

SN - 979-8-3503-6311-1

SP - 70

EP - 75

BT - IEEE Xplore

PB - IEEE

ER -

Pre-Trained or Adversarial Training: A Comparison of NER Methods on Chinese Drug Specifications

Abstract

Access to Document

Fingerprint

Cite this