TY - GEN
T1 - Anomaly Detection by Using Streaming K-Means and Batch K-Means
AU - Wang, Zhuo
AU - Zhou, Yanghui
AU - Li, Gangmin
N1 - Funding Information:
This research work was supported by XJTLU research team, especially for help of Dr. Gangmin Li and Mr. Yanghui Zhou. First of all, I am very grateful to research tutor, Dr. Gangmin, for providing me with opportunity of participating into this research and guiding me patiently. Without his help, I cannot learn theory knowledge of Big Data Analytics and how to applied knowledge of Big Data Analytics to real life. Actually, his patient help and rigorous academic characteristic stimulates my academic interest and encourage me to pursue academic goal. Apart from academic knowledge, Dr. Gangmin Li also provides me with suggestions about future plan which really inspires me.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - This paper introduces K-Means algorithm as new technique for detecting anomaly. Data analysis has been applied to industry field widely and plays important role in it. However, conventional data analysis method cannot process large-scale data in considerable time and waste lots of computing resources. Conversely, Batch processing and Stream processing are equipped with property of processing data in short time interval, especially stream processing, can process data in real-time. This paper also compares Batch K-Means processing with Streaming K-Means processing according to distance, cost value and cluster distribution factors. Moreover, this paper also discusses how to reach optimized K value of Batch K-means model and Streaming K-means model, analyzes attributes of Batch K-Means processing and Streaming K-Means processing and finds limitations of these two processing models. Finally, the paper proposes limitations of research experiment and future improvement of clustering technique.
AB - This paper introduces K-Means algorithm as new technique for detecting anomaly. Data analysis has been applied to industry field widely and plays important role in it. However, conventional data analysis method cannot process large-scale data in considerable time and waste lots of computing resources. Conversely, Batch processing and Stream processing are equipped with property of processing data in short time interval, especially stream processing, can process data in real-time. This paper also compares Batch K-Means processing with Streaming K-Means processing according to distance, cost value and cluster distribution factors. Moreover, this paper also discusses how to reach optimized K value of Batch K-means model and Streaming K-means model, analyzes attributes of Batch K-Means processing and Streaming K-Means processing and finds limitations of these two processing models. Finally, the paper proposes limitations of research experiment and future improvement of clustering technique.
KW - big data
KW - cluster distribution
KW - k-means clustering
KW - optimized K-value
KW - streaming k-means clustering
UR - http://www.scopus.com/inward/record.url?scp=85085927658&partnerID=8YFLogxK
U2 - 10.1109/ICBDA49040.2020.9101212
DO - 10.1109/ICBDA49040.2020.9101212
M3 - Conference Proceeding
AN - SCOPUS:85085927658
T3 - 2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020
SP - 11
EP - 17
BT - 2020 5th IEEE International Conference on Big Data Analytics, ICBDA 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data Analytics, ICBDA 2020
Y2 - 8 May 2020 through 11 May 2020
ER -