Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models

W. K. Wong; Filbert H. Juwono; Ing Ming Chew; Basil Andy Lease

doi:10.18080/jtde.v11n3.789

Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models

W. K. Wong, Filbert H. Juwono, Ing Ming Chew, Basil Andy Lease

Department of Electrical and Electronic Engineering

Curtin University

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.

Original language	English
Pages (from-to)	84-104
Number of pages	21
Journal	Journal of Telecommunications and the Digital Economy
Volume	11
Issue number	3
DOIs	https://doi.org/10.18080/jtde.v11n3.789
Publication status	Published - Sept 2023

Keywords

COVID-19
Fake news
black box model
machine learning
white box model

Access to Document

10.18080/jtde.v11n3.789

Cite this

@article{d3336f23c864472e9ac3e729c740a644,

title = "Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models",

abstract = "In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.",

keywords = "COVID-19, Fake news, black box model, machine learning, white box model",

author = "Wong, {W. K.} and Juwono, {Filbert H.} and Chew, {Ing Ming} and Lease, {Basil Andy}",

note = "Publisher Copyright: Copyright {\textcopyright} 2023.",

year = "2023",

month = sep,

doi = "10.18080/jtde.v11n3.789",

language = "English",

volume = "11",

pages = "84--104",

journal = "Journal of Telecommunications and the Digital Economy",

issn = "2203-1693",

number = "3",

}

TY - JOUR

T1 - Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models

AU - Wong, W. K.

AU - Juwono, Filbert H.

AU - Chew, Ing Ming

AU - Lease, Basil Andy

PY - 2023/9

Y1 - 2023/9

N2 - In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.

AB - In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.

KW - COVID-19

KW - Fake news

KW - black box model

KW - machine learning

KW - white box model

UR - http://www.scopus.com/inward/record.url?scp=85176209123&partnerID=8YFLogxK

U2 - 10.18080/jtde.v11n3.789

DO - 10.18080/jtde.v11n3.789

M3 - Article

AN - SCOPUS:85176209123

SN - 2203-1693

VL - 11

SP - 84

EP - 104

JO - Journal of Telecommunications and the Digital Economy

JF - Journal of Telecommunications and the Digital Economy

IS - 3

ER -

Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this