An Word2vec based on Chinese Medical Knowledge

Jiayi Zhu, Pin Ni, Yuming Li, Junkun Peng, Zhenjin Dai, Gangmin Li, Xuming Bai

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

6 Citations (Scopus)

Abstract

Introducing a large amount of external prior domain knowledge will effectively improve the performance of the word embedded language model in downstream NLP tasks. Based on this assumption, we collect and collate a medical corpus data with about 36M (Million) characters and use the data of CCKS2019 as the test set to carry out multiple classifications and named entity recognition (NER) tasks with the generated word and character vectors. Compared with the results of BERT, our models obtained the ideal performance and efficiency results.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6263-6265
Number of pages3
ISBN (Electronic)9781728108582
DOIs
Publication statusPublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: 9 Dec 201912 Dec 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
Country/TerritoryUnited States
CityLos Angeles
Period9/12/1912/12/19

Keywords

  • EMR
  • Language Model
  • Word Embedding

Fingerprint

Dive into the research topics of 'An Word2vec based on Chinese Medical Knowledge'. Together they form a unique fingerprint.

Cite this