TY - JOUR
T1 - Automatic ontology generation from patents using a pre-built library, WordNet and a class-based n-gram model
AU - Li, Zhen
AU - Tate, Derrick
N1 - Publisher Copyright:
Copyright © 2015 Inderscience Enterprises Ltd.
PY - 2015
Y1 - 2015
N2 - An ontology is defined as a structured, hierarchical way for describing domain knowledge. Research work regarding ontological engineering has yielded fruitful results, but these methods share a common drawback: they require significant manual work to generate an ontology, which limits the usefulness of these approaches in practice. In this paper, we propose a computational model that combines data mining, Natural Language Processing (NLP), WordNet and a novel class-based n-gram model for automatic ontology discovery and recognition from existing patent documents. A pre-built ontology library was constructed by gathering knowledge from engineering textbooks and dictionaries. Then a data set of engineering patent claims was split into training (80%) and validation (20%) subsets. The pre-built library and WordNet were used to generate class labels for constructing classbased n-gram models in a training process. The holdout validation showed that the average accuracy was 87.26% for all validation samples.
AB - An ontology is defined as a structured, hierarchical way for describing domain knowledge. Research work regarding ontological engineering has yielded fruitful results, but these methods share a common drawback: they require significant manual work to generate an ontology, which limits the usefulness of these approaches in practice. In this paper, we propose a computational model that combines data mining, Natural Language Processing (NLP), WordNet and a novel class-based n-gram model for automatic ontology discovery and recognition from existing patent documents. A pre-built ontology library was constructed by gathering knowledge from engineering textbooks and dictionaries. Then a data set of engineering patent claims was split into training (80%) and validation (20%) subsets. The pre-built library and WordNet were used to generate class labels for constructing classbased n-gram models in a training process. The holdout validation showed that the average accuracy was 87.26% for all validation samples.
KW - N-gram language model
KW - Natural language processing
KW - Ontological engineering
UR - http://www.scopus.com/inward/record.url?scp=84928819124&partnerID=8YFLogxK
U2 - 10.1504/IJPD.2015.068965
DO - 10.1504/IJPD.2015.068965
M3 - Article
AN - SCOPUS:84928819124
SN - 1477-9056
VL - 20
SP - 142
EP - 172
JO - International Journal of Product Development
JF - International Journal of Product Development
IS - 2
ER -