Anomaly Detection by Using Streaming K-Means and Batch K-Means

Zhuo Wang; Yanghui Zhou; Gangmin Li

doi:10.1109/ICBDA49040.2020.9101212

Anomaly Detection by Using Streaming K-Means and Batch K-Means

Zhuo Wang, Yanghui Zhou, Gangmin Li

School of Advanced Technology

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

41 Citations (Scopus)

Abstract

This paper introduces K-Means algorithm as new technique for detecting anomaly. Data analysis has been applied to industry field widely and plays important role in it. However, conventional data analysis method cannot process large-scale data in considerable time and waste lots of computing resources. Conversely, Batch processing and Stream processing are equipped with property of processing data in short time interval, especially stream processing, can process data in real-time. This paper also compares Batch K-Means processing with Streaming K-Means processing according to distance, cost value and cluster distribution factors. Moreover, this paper also discusses how to reach optimized K value of Batch K-means model and Streaming K-means model, analyzes attributes of Batch K-Means processing and Streaming K-Means processing and finds limitations of these two processing models. Finally, the paper proposes limitations of research experiment and future improvement of clustering technique.

Original language	English
Title of host publication	2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	11-17
Number of pages	7
ISBN (Electronic)	9781728141114
DOIs	https://doi.org/10.1109/ICBDA49040.2020.9101212
Publication status	Published - May 2020
Event	5th IEEE International Conference on Big Data Analytics, ICBDA 2020 - Xiamen, China Duration: 8 May 2020 → 11 May 2020

Publication series

Name	2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020

Conference

Conference	5th IEEE International Conference on Big Data Analytics, ICBDA 2020
Country/Territory	China
City	Xiamen
Period	8/05/20 → 11/05/20

Keywords

big data
cluster distribution
k-means clustering
optimized K-value
streaming k-means clustering

Access to Document

10.1109/ICBDA49040.2020.9101212

Cite this

Wang, Z., Zhou, Y., & Li, G. (2020). Anomaly Detection by Using Streaming K-Means and Batch K-Means. In 2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020 (pp. 11-17). Article 9101212 (2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICBDA49040.2020.9101212

@inproceedings{18b5b4b4cc264579a1d0e8370bf18873,

title = "Anomaly Detection by Using Streaming K-Means and Batch K-Means",

abstract = "This paper introduces K-Means algorithm as new technique for detecting anomaly. Data analysis has been applied to industry field widely and plays important role in it. However, conventional data analysis method cannot process large-scale data in considerable time and waste lots of computing resources. Conversely, Batch processing and Stream processing are equipped with property of processing data in short time interval, especially stream processing, can process data in real-time. This paper also compares Batch K-Means processing with Streaming K-Means processing according to distance, cost value and cluster distribution factors. Moreover, this paper also discusses how to reach optimized K value of Batch K-means model and Streaming K-means model, analyzes attributes of Batch K-Means processing and Streaming K-Means processing and finds limitations of these two processing models. Finally, the paper proposes limitations of research experiment and future improvement of clustering technique.",

keywords = "big data, cluster distribution, k-means clustering, optimized K-value, streaming k-means clustering",

author = "Zhuo Wang and Yanghui Zhou and Gangmin Li",

note = "Funding Information: This research work was supported by XJTLU research team, especially for help of Dr. Gangmin Li and Mr. Yanghui Zhou. First of all, I am very grateful to research tutor, Dr. Gangmin, for providing me with opportunity of participating into this research and guiding me patiently. Without his help, I cannot learn theory knowledge of Big Data Analytics and how to applied knowledge of Big Data Analytics to real life. Actually, his patient help and rigorous academic characteristic stimulates my academic interest and encourage me to pursue academic goal. Apart from academic knowledge, Dr. Gangmin Li also provides me with suggestions about future plan which really inspires me. Publisher Copyright: {\textcopyright} 2020 IEEE.; 5th IEEE International Conference on Big Data Analytics, ICBDA 2020 ; Conference date: 08-05-2020 Through 11-05-2020",

year = "2020",

month = may,

doi = "10.1109/ICBDA49040.2020.9101212",

language = "English",

series = "2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "11--17",

booktitle = "2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020",

}

Wang, Z, Zhou, Y & Li, G 2020, Anomaly Detection by Using Streaming K-Means and Batch K-Means. in 2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020., 9101212, 2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020, Institute of Electrical and Electronics Engineers Inc., pp. 11-17, 5th IEEE International Conference on Big Data Analytics, ICBDA 2020, Xiamen, China, 8/05/20. https://doi.org/10.1109/ICBDA49040.2020.9101212

Anomaly Detection by Using Streaming K-Means and Batch K-Means. / Wang, Zhuo; Zhou, Yanghui; Li, Gangmin.
2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020. Institute of Electrical and Electronics Engineers Inc., 2020. p. 11-17 9101212 (2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Anomaly Detection by Using Streaming K-Means and Batch K-Means

AU - Wang, Zhuo

AU - Zhou, Yanghui

AU - Li, Gangmin

N1 - Funding Information: This research work was supported by XJTLU research team, especially for help of Dr. Gangmin Li and Mr. Yanghui Zhou. First of all, I am very grateful to research tutor, Dr. Gangmin, for providing me with opportunity of participating into this research and guiding me patiently. Without his help, I cannot learn theory knowledge of Big Data Analytics and how to applied knowledge of Big Data Analytics to real life. Actually, his patient help and rigorous academic characteristic stimulates my academic interest and encourage me to pursue academic goal. Apart from academic knowledge, Dr. Gangmin Li also provides me with suggestions about future plan which really inspires me. Publisher Copyright: © 2020 IEEE.

PY - 2020/5

Y1 - 2020/5

N2 - This paper introduces K-Means algorithm as new technique for detecting anomaly. Data analysis has been applied to industry field widely and plays important role in it. However, conventional data analysis method cannot process large-scale data in considerable time and waste lots of computing resources. Conversely, Batch processing and Stream processing are equipped with property of processing data in short time interval, especially stream processing, can process data in real-time. This paper also compares Batch K-Means processing with Streaming K-Means processing according to distance, cost value and cluster distribution factors. Moreover, this paper also discusses how to reach optimized K value of Batch K-means model and Streaming K-means model, analyzes attributes of Batch K-Means processing and Streaming K-Means processing and finds limitations of these two processing models. Finally, the paper proposes limitations of research experiment and future improvement of clustering technique.

AB - This paper introduces K-Means algorithm as new technique for detecting anomaly. Data analysis has been applied to industry field widely and plays important role in it. However, conventional data analysis method cannot process large-scale data in considerable time and waste lots of computing resources. Conversely, Batch processing and Stream processing are equipped with property of processing data in short time interval, especially stream processing, can process data in real-time. This paper also compares Batch K-Means processing with Streaming K-Means processing according to distance, cost value and cluster distribution factors. Moreover, this paper also discusses how to reach optimized K value of Batch K-means model and Streaming K-means model, analyzes attributes of Batch K-Means processing and Streaming K-Means processing and finds limitations of these two processing models. Finally, the paper proposes limitations of research experiment and future improvement of clustering technique.

KW - big data

KW - cluster distribution

KW - k-means clustering

KW - optimized K-value

KW - streaming k-means clustering

UR - http://www.scopus.com/inward/record.url?scp=85085927658&partnerID=8YFLogxK

U2 - 10.1109/ICBDA49040.2020.9101212

DO - 10.1109/ICBDA49040.2020.9101212

M3 - Conference Proceeding

AN - SCOPUS:85085927658

T3 - 2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020

SP - 11

EP - 17

BT - 2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 5th IEEE International Conference on Big Data Analytics, ICBDA 2020

Y2 - 8 May 2020 through 11 May 2020

ER -

Anomaly Detection by Using Streaming K-Means and Batch K-Means

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this