Research on Intelligent Construction Algorithm of Subject Knowledge Thesaurus Based on Literature Resources

Xiaoxia Wang, Xiaozhong Xu, Jiarui Zhang, Yue Zhu, Yuhang Fan, Pengjing Xu*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)


The implementation of National Science and Technology Innovation Strategy demands exponential growing in knowledge services on literature information institutions. It is the most important knowledge organization tool for Information Retrieval, which can be widely used for semantic citation, organization and retrieval of literature resources. This study aims to develop an innovative algorithm for constructing subject thesaurus based on massive literature resource data and mining academic neologisms, also the semantic relationship between academic neologisms and subject system. We firstly collect a dataset of literature corpus, corresponding work for data pre-processing carried out. Then using the FastText model to complete academic neologisms mining, we construct an automatic categorization model of academic neologisms based on the Bert and TextCNN algorithm. The algorithm proposed in this study is validated by 8.1 million multi-source and heterogeneous literature data in the field of marine disciplines. The result shows that the algorithm can effectively replace 90% of the manual annotation volume, mine a large number of high-quality marine neologisms and successfully build the marine science knowledge base with a pass rate of 82.6% reviewed by expert, which present high accuracy and certain engineering application prospects.

Original languageEnglish
Article number012038
JournalJournal of Physics: Conference Series
Issue number1
Publication statusPublished - 29 Jun 2021
Externally publishedYes
Event2021 4th International Symposium on Big Data and Applied Statistics, ISBDAS 2021 - Dali, China
Duration: 21 May 202123 May 2021


Dive into the research topics of 'Research on Intelligent Construction Algorithm of Subject Knowledge Thesaurus Based on Literature Resources'. Together they form a unique fingerprint.

Cite this