Online commercial intention detection framework based on web pages

Huakang Li*, Xiaofeng Xu, Longbin Lai, Yao Shen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

The China Internet Network Information Centre (CNNIC) published that internet users around the world mostly spent 10-16 hours per week online. For effective advertising and social information publishing on the internet, how to dig out the commercial value from users' online behaviour becomes a new challenge compared with the traditional recommendation system. In this paper, we propose a novel system named 'online commercial intention (OCI) detection system' using users' global web browsing history to predict potential purchasing products on an online shopping platform. A 'commercial keyword dictionary (KD)' that reveals the relationship between user queries and product categories is firstly set up by analysing the click distribution of billion queries on the shopping platform. Footprints of millions of internet users are gathered and the raw page contents are crawled. Keywords in these pages are extracted using N-gram algorithm and commercial probabilities are estimated with query frequency (QF), inverse category frequency (ICF), etc. The page OCI is estimated by merging the KD matrices of its commercial keywords. In order to increase categories' coherence and accuracy, we provide a category similarity model to observe the distance between top N categories. The experiment results show that category prediction accuracy reaches 86% with manual evaluation.

Original languageEnglish
Pages (from-to)176-185
Number of pages10
JournalInternational Journal of Computational Science and Engineering
Volume12
Issue number2-3
DOIs
Publication statusPublished - 2016
Externally publishedYes

Keywords

  • Category similarity model
  • Commercial keyword dictionary
  • Commercial probabilities
  • Large-scale data
  • OCI
  • Online commercial intention
  • Product categories
  • User profile
  • User-online behaviour

Fingerprint

Dive into the research topics of 'Online commercial intention detection framework based on web pages'. Together they form a unique fingerprint.

Cite this