TY - GEN
T1 - User interest profile identification using Wikipedia knowledge database
AU - Li, Huakang
AU - Lai, Longbin
AU - Xu, Xiaofeng
AU - Shen, Yao
AU - Xu, Xiangyang
AU - Xia, Chunrong
PY - 2014
Y1 - 2014
N2 - The interesting, targeted, relevant advertisement is considered as one of the most honest proceeds for personalizing recommendation. Topic identification is the most important technique for the unstructured web pages. Conventional content classification approaches based on bag of words are difficult to process massive web pages. In this paper, Wikipedia Category Network (WCN) nodes are used to identify a web page topic and estimate user's interest profile. Wikipedia is the largest contents knowledge database and updated dynamically. A basic interest data set is marked for WCN. The topic characterization for each WCN node is generated with the depth and breadth of the interest data set. To reduce the deviation of the breadth, a family generation algorithm is proposed to estimate the generation weight in WCN. Finally, an interest decay model based on URL number is proposed to represent user's interest profile in time period. Experimental results illustrated that the performance of Web page topic identification is significant using WCN with family model, and the profile identification model has a dynamical performance for active users.
AB - The interesting, targeted, relevant advertisement is considered as one of the most honest proceeds for personalizing recommendation. Topic identification is the most important technique for the unstructured web pages. Conventional content classification approaches based on bag of words are difficult to process massive web pages. In this paper, Wikipedia Category Network (WCN) nodes are used to identify a web page topic and estimate user's interest profile. Wikipedia is the largest contents knowledge database and updated dynamically. A basic interest data set is marked for WCN. The topic characterization for each WCN node is generated with the depth and breadth of the interest data set. To reduce the deviation of the breadth, a family generation algorithm is proposed to estimate the generation weight in WCN. Finally, an interest decay model based on URL number is proposed to represent user's interest profile in time period. Experimental results illustrated that the performance of Web page topic identification is significant using WCN with family model, and the profile identification model has a dynamical performance for active users.
KW - URL decay model
KW - Web page Classification
KW - Wikipedia knowledge network
KW - family similarity
KW - user profile
UR - http://www.scopus.com/inward/record.url?scp=84903973666&partnerID=8YFLogxK
U2 - 10.1109/HPCC.and.EUC.2013.340
DO - 10.1109/HPCC.and.EUC.2013.340
M3 - Conference Proceeding
AN - SCOPUS:84903973666
SN - 9780769550886
T3 - Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013
SP - 2362
EP - 2367
BT - Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013
PB - IEEE Computer Society
T2 - 15th IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2013
Y2 - 13 November 2013 through 15 November 2013
ER -