Abstract
Named Entity Recognition (NER) is widely used for Natural Language Processing (NLP) but most of the current work focus on analyzing English-based text. This paper compares different NER models in extracting key contents from Chinese drug specifications. These key contents help identify important information about the drugs to the users. Three models were initially chosen for this research, namely BiLSTM-CRF, MiniRBT-BiLSTM-CRF, and MiniRBT-CRF. Experimental results show that MiniRBT-CRF outperforms the other two models, achieving high precision and F1 scores. We then worked on optimizing this model with word embedding and adversarial training. Firstly, we replaced MiniRBT with BERT-Base-Chinese model and the results show that the BERT-CRF has a 2% growth in F1 scores over the MiniRBT-CRF. Next, we augmented adversarial training to the BERT-CRF model. However, the results show that BERT-CRFAdv only increased the precision score by 1%, but not the F1 score. The results thus suggest that in order to enhance an NER model for Chinese-based text, optimizing the underlying model is the better choice.
Original language | English |
---|---|
Title of host publication | IEEE Xplore |
Publisher | IEEE |
Publication status | Published - 2024 |