TY - JOUR
T1 - Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models
AU - Wong, W. K.
AU - Juwono, Filbert H.
AU - Chew, Ing Ming
AU - Lease, Basil Andy
N1 - Publisher Copyright:
Copyright © 2023.
PY - 2023/9
Y1 - 2023/9
N2 - In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.
AB - In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.
KW - COVID-19
KW - Fake news
KW - black box model
KW - machine learning
KW - white box model
UR - http://www.scopus.com/inward/record.url?scp=85176209123&partnerID=8YFLogxK
U2 - 10.18080/jtde.v11n3.789
DO - 10.18080/jtde.v11n3.789
M3 - Article
AN - SCOPUS:85176209123
SN - 2203-1693
VL - 11
SP - 84
EP - 104
JO - Journal of Telecommunications and the Digital Economy
JF - Journal of Telecommunications and the Digital Economy
IS - 3
ER -