TY - JOUR
T1 - Categorization of Webpages using dynamic mutation based differential evolution and gradient boost classifier
AU - Mehedi, Ibrahim M.
AU - Shah, Mohd Heidir Mohd
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/7
Y1 - 2023/7
N2 - With the growths in Internet technologies, the Website categorization has turned into a demanding field of research. Webpages with destructive and offensive subjects like violence, phishing, scam, radicalism, etc. have flourished over the past several years. Also, an extensive volume of Webpages with different subjects has hampered data extraction and retrieval approaches from delivering optimum subject-related outcomes. Therefore, an efficient approach is desirable to categorize Webpages. In this paper, gradient boosting classifier (GBC) model is used to categorize Websites. It is achieved by utilizing optical character recognition and web scraping, followed by a group of nontrivial text mining and histogram of oriented gradients based feature extraction steps. Thereafter, the proposed GBC is used to recognize Websites. However, GBC suffer from the hyper-parameters tuning issue, therefore, dynamic mutation based differential evolution is used to classify the Websites. The mutation ratio of dynamic mutation based differential evolution is selected dynamically using a differential-evolution-based positioning optimization algorithm. The strength of the proposed and the existing models are also validated against the existence of mis-recognized training contents. Extensive experiments reveal that the proposed Website categorization model outperforms the competitive models.
AB - With the growths in Internet technologies, the Website categorization has turned into a demanding field of research. Webpages with destructive and offensive subjects like violence, phishing, scam, radicalism, etc. have flourished over the past several years. Also, an extensive volume of Webpages with different subjects has hampered data extraction and retrieval approaches from delivering optimum subject-related outcomes. Therefore, an efficient approach is desirable to categorize Webpages. In this paper, gradient boosting classifier (GBC) model is used to categorize Websites. It is achieved by utilizing optical character recognition and web scraping, followed by a group of nontrivial text mining and histogram of oriented gradients based feature extraction steps. Thereafter, the proposed GBC is used to recognize Websites. However, GBC suffer from the hyper-parameters tuning issue, therefore, dynamic mutation based differential evolution is used to classify the Websites. The mutation ratio of dynamic mutation based differential evolution is selected dynamically using a differential-evolution-based positioning optimization algorithm. The strength of the proposed and the existing models are also validated against the existence of mis-recognized training contents. Extensive experiments reveal that the proposed Website categorization model outperforms the competitive models.
KW - Categorization
KW - Gradient boost
KW - Machine learning
KW - Webpage
UR - http://www.scopus.com/inward/record.url?scp=85120616057&partnerID=8YFLogxK
U2 - 10.1007/s12652-021-03601-2
DO - 10.1007/s12652-021-03601-2
M3 - Article
AN - SCOPUS:85120616057
SN - 1868-5137
VL - 14
SP - 8363
EP - 8374
JO - Journal of Ambient Intelligence and Humanized Computing
JF - Journal of Ambient Intelligence and Humanized Computing
IS - 7
ER -