An information classification approach based on knowledge network

Huakang Li, Guozi Sun, Bei Xu, Li Li, Jie Huang, Keita Tanno, Wenxu Wu, Changen Xu

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

2 Citations (Scopus)

Abstract

Numerous critical Internet applications with high-quality services, such as Web directory, search engine, Web crawler, recommendation system and user profile detector, etc. Almost depend on the efficient and accurate of web page classification system. Traditional supervised or semi-supervised machine learning methods become more and more difficult to adapt to the explosive Internet information. This paper proposed a web page classification method based on the topological structure of Wikipedia knowledge network. The kinship-relation association based on content similarity was proposed to solve the unbalance problem when a category node inherited the probability from multiple fathers. We used N-gram based on Wikipedia words to extract the keywords from web page, and introduce Bayes classifier to estimate the page class probability. Experimental results shown that the proposed method has very good scalability, robustness and reliability for different web pages.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3-8
Number of pages6
ISBN (Electronic)9781479943050
DOIs
Publication statusPublished - 6 Nov 2014
Externally publishedYes
Event2014 8th IEEE International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014 - Aizu-Wakamatsu, Japan
Duration: 23 Sept 201425 Sept 2014

Publication series

NameProceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014

Conference

Conference2014 8th IEEE International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
Country/TerritoryJapan
CityAizu-Wakamatsu
Period23/09/1425/09/14

Cite this