TY - GEN
T1 - Entity recognition by distant supervision with soft list constraint
AU - Tu, Hongkui
AU - Ma, Zongyang
AU - Sun, Aixin
AU - Xu, Zhiqiang
AU - Wang, Xiaodong
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Supervised named entity recognition systems often suffer from training data inadequacy when deal with domain specific corpora, e.g., documents in medical and healthcare. For these domains, obtaining some seed words or phrases is not very difficult. Then, some positive instances obtained through distant supervision based on the seeds can be used to learn recognition models. However, with the limited size of training samples and no negative ones, the classifying results may not be satisfying. In this paper, we leverage the conjunction and comma writing style as the list constraint to enlarge the set of training instances. Different from earlier studies, we formulate two kinds of constraints, namely, soft list constraint and mention constraint, as regularizers. We then incorporate the constraints to a unified discriminative learning framework and propose a joint optimization algorithm. The experimental results show that our model is superior than state-of-the-art baselines on a large collection of documents about drugs.
AB - Supervised named entity recognition systems often suffer from training data inadequacy when deal with domain specific corpora, e.g., documents in medical and healthcare. For these domains, obtaining some seed words or phrases is not very difficult. Then, some positive instances obtained through distant supervision based on the seeds can be used to learn recognition models. However, with the limited size of training samples and no negative ones, the classifying results may not be satisfying. In this paper, we leverage the conjunction and comma writing style as the list constraint to enlarge the set of training instances. Different from earlier studies, we formulate two kinds of constraints, namely, soft list constraint and mention constraint, as regularizers. We then incorporate the constraints to a unified discriminative learning framework and propose a joint optimization algorithm. The experimental results show that our model is superior than state-of-the-art baselines on a large collection of documents about drugs.
KW - Biomedical information extraction
KW - Distant supervision
UR - http://www.scopus.com/inward/record.url?scp=85033670591&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-69179-4_48
DO - 10.1007/978-3-319-69179-4_48
M3 - Conference Proceeding
AN - SCOPUS:85033670591
SN - 9783319691787
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 681
EP - 694
BT - Advanced Data Mining and Applications - 13th International Conference, ADMA 2017, Proceedings
A2 - Peng, Wen-Chih
A2 - Zhang, Wei Emma
A2 - Cong, Gao
A2 - Sun, Aixin
A2 - Li, Chengliang
PB - Springer Verlag
T2 - 13th International Conference on Advanced Data Mining and Applications, ADMA 2017
Y2 - 5 November 2017 through 6 November 2017
ER -