Entity recognition by distant supervision with soft list constraint

Hongkui Tu*, Zongyang Ma, Aixin Sun, Zhiqiang Xu, Xiaodong Wang

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Supervised named entity recognition systems often suffer from training data inadequacy when deal with domain specific corpora, e.g., documents in medical and healthcare. For these domains, obtaining some seed words or phrases is not very difficult. Then, some positive instances obtained through distant supervision based on the seeds can be used to learn recognition models. However, with the limited size of training samples and no negative ones, the classifying results may not be satisfying. In this paper, we leverage the conjunction and comma writing style as the list constraint to enlarge the set of training instances. Different from earlier studies, we formulate two kinds of constraints, namely, soft list constraint and mention constraint, as regularizers. We then incorporate the constraints to a unified discriminative learning framework and propose a joint optimization algorithm. The experimental results show that our model is superior than state-of-the-art baselines on a large collection of documents about drugs.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 13th International Conference, ADMA 2017, Proceedings
EditorsWen-Chih Peng, Wei Emma Zhang, Gao Cong, Aixin Sun, Chengliang Li
PublisherSpringer Verlag
Pages681-694
Number of pages14
ISBN (Print)9783319691787
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event13th International Conference on Advanced Data Mining and Applications, ADMA 2017 - Singapore, Singapore
Duration: 5 Nov 20176 Nov 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10604 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Conference on Advanced Data Mining and Applications, ADMA 2017
Country/TerritorySingapore
CitySingapore
Period5/11/176/11/17

Keywords

  • Biomedical information extraction
  • Distant supervision

Fingerprint

Dive into the research topics of 'Entity recognition by distant supervision with soft list constraint'. Together they form a unique fingerprint.

Cite this