Automatic ontology generation from patents using a pre-built library, WordNet and a class-based n-gram model

Zhen Li, Derrick Tate*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

An ontology is defined as a structured, hierarchical way for describing domain knowledge. Research work regarding ontological engineering has yielded fruitful results, but these methods share a common drawback: they require significant manual work to generate an ontology, which limits the usefulness of these approaches in practice. In this paper, we propose a computational model that combines data mining, Natural Language Processing (NLP), WordNet and a novel class-based n-gram model for automatic ontology discovery and recognition from existing patent documents. A pre-built ontology library was constructed by gathering knowledge from engineering textbooks and dictionaries. Then a data set of engineering patent claims was split into training (80%) and validation (20%) subsets. The pre-built library and WordNet were used to generate class labels for constructing classbased n-gram models in a training process. The holdout validation showed that the average accuracy was 87.26% for all validation samples.

Original languageEnglish
Pages (from-to)142-172
Number of pages31
JournalInternational Journal of Product Development
Volume20
Issue number2
DOIs
Publication statusPublished - 2015

Keywords

  • N-gram language model
  • Natural language processing
  • Ontological engineering

Fingerprint

Dive into the research topics of 'Automatic ontology generation from patents using a pre-built library, WordNet and a class-based n-gram model'. Together they form a unique fingerprint.

Cite this