TY - GEN
T1 - An information classification approach based on knowledge network
AU - Li, Huakang
AU - Sun, Guozi
AU - Xu, Bei
AU - Li, Li
AU - Huang, Jie
AU - Tanno, Keita
AU - Wu, Wenxu
AU - Xu, Changen
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/6
Y1 - 2014/11/6
N2 - Numerous critical Internet applications with high-quality services, such as Web directory, search engine, Web crawler, recommendation system and user profile detector, etc. Almost depend on the efficient and accurate of web page classification system. Traditional supervised or semi-supervised machine learning methods become more and more difficult to adapt to the explosive Internet information. This paper proposed a web page classification method based on the topological structure of Wikipedia knowledge network. The kinship-relation association based on content similarity was proposed to solve the unbalance problem when a category node inherited the probability from multiple fathers. We used N-gram based on Wikipedia words to extract the keywords from web page, and introduce Bayes classifier to estimate the page class probability. Experimental results shown that the proposed method has very good scalability, robustness and reliability for different web pages.
AB - Numerous critical Internet applications with high-quality services, such as Web directory, search engine, Web crawler, recommendation system and user profile detector, etc. Almost depend on the efficient and accurate of web page classification system. Traditional supervised or semi-supervised machine learning methods become more and more difficult to adapt to the explosive Internet information. This paper proposed a web page classification method based on the topological structure of Wikipedia knowledge network. The kinship-relation association based on content similarity was proposed to solve the unbalance problem when a category node inherited the probability from multiple fathers. We used N-gram based on Wikipedia words to extract the keywords from web page, and introduce Bayes classifier to estimate the page class probability. Experimental results shown that the proposed method has very good scalability, robustness and reliability for different web pages.
UR - http://www.scopus.com/inward/record.url?scp=84917735771&partnerID=8YFLogxK
U2 - 10.1109/MCSoC.2014.10
DO - 10.1109/MCSoC.2014.10
M3 - Conference Proceeding
AN - SCOPUS:84917735771
T3 - Proceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
SP - 3
EP - 8
BT - Proceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 8th IEEE International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
Y2 - 23 September 2014 through 25 September 2014
ER -