Vision-Based Malware Detection: A Transfer Learning Approach Using Optimal ECOC-SVM Configuration

W. K. Wong, Filbert H. Juwono, Catur Apriono*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)

Abstract

Currently, malicious software (malware) detection is becoming important due to the presence of various malware as well as ransomware in digital cyberspace. Advances in Deep Learning (DL) have attracted a lot of interests in applications of malware detection. The file binaries are fed into the DL neural networks for training and testing. However, we find that overfitting may occur despite applying some precautions, such as dropout layers. The limitations can also be attributed to the final classification layers. Furthermore, in a multiclass classification task, the performance can be improved by employing a final classifier layer that is more efficient at dealing with malware characteristics. In this paper, we apply transfer learning using ShuffleNet and DenseNet-201, which are two models trained on large dataset to recognize daily objects. Features embedded in all layers may be further exploited in a way that does not result in overfitting. In particular, the entire network is frozen to prevent overfitting and an Optimal Error Correction Output Coding (ECOC) ensemble configuration of Support Vector Machines (SVM) is applied as the final classification layer. Several ECOC coding matrices are applied, i.e., One vs. All (OVA), One vs. One (OVO), Dense Random (DR), and Sparse Random (SR). Each of these configurations represents varying complexity and ensemble size and, hence, a tradeoff between computation reduction and complex non-linear separation appears. Given that the continuous values of SVM parameters may take up high computation for acquiring the optimal parameter configuration, we apply discrete values combination using a grid search approach for parameter optimization. We test the proposed model on Malimg, MaleVis, virus-MNIST, and Dumpware10 datasets. The results show better/comparable accuracy compared with the existing work. The best/average accuracy values for each dataset over 10 trials are: Malimg (99.14%/98.87%), MaleVis (95.01%/93.91%), Virus-MNIST (86.36%/85.79%), Dumpware10 (96.62%/95.79%).

Original languageEnglish
Pages (from-to)159262-159270
Number of pages9
JournalIEEE Access
Volume9
DOIs
Publication statusPublished - 2021
Externally publishedYes

Keywords

  • ECOC
  • Malware
  • SVM
  • machine learning

Fingerprint

Dive into the research topics of 'Vision-Based Malware Detection: A Transfer Learning Approach Using Optimal ECOC-SVM Configuration'. Together they form a unique fingerprint.

Cite this