TY - JOUR
T1 - Unique Neighborhood Set Parameter Independent Density-Based Clustering With Outlier Detection
AU - Rahman, Md Anisur
AU - Ang, Kenneth Li Minn
AU - Seng, Kah Phooi
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2018/8/12
Y1 - 2018/8/12
N2 - Machine learning algorithms such as clustering, classification, and regression typically require a set of parameters to be provided by the user before the algorithms can perform well. In this paper, we present parameter independent density-based clustering algorithms by utilizing two novel concepts for neighborhood functions which we term as unique closest neighbor and unique neighborhood set. We discuss two derivatives of the proposed parameter independent density-based clustering (PIDC) algorithms, termed PIDC-WO and PIDC-O. PIDC-WO has been designed for data sets that do not contain explicit outliers whereas PIDC-O provides very good performance even on data sets with the presence of outliers. PIDC-O uses a two-stage processing where the first stage identifies and removes outliers before passing the records to the second stage to perform the density-based clustering. The PIDC algorithms are extensively evaluated and compared with other well-known clustering algorithms on several data sets using three cluster evaluation criteria (F-measure, entropy, and purity) used in the literature, and are shown to perform effectively both for the clustering and outlier detection objectives.
AB - Machine learning algorithms such as clustering, classification, and regression typically require a set of parameters to be provided by the user before the algorithms can perform well. In this paper, we present parameter independent density-based clustering algorithms by utilizing two novel concepts for neighborhood functions which we term as unique closest neighbor and unique neighborhood set. We discuss two derivatives of the proposed parameter independent density-based clustering (PIDC) algorithms, termed PIDC-WO and PIDC-O. PIDC-WO has been designed for data sets that do not contain explicit outliers whereas PIDC-O provides very good performance even on data sets with the presence of outliers. PIDC-O uses a two-stage processing where the first stage identifies and removes outliers before passing the records to the second stage to perform the density-based clustering. The PIDC algorithms are extensively evaluated and compared with other well-known clustering algorithms on several data sets using three cluster evaluation criteria (F-measure, entropy, and purity) used in the literature, and are shown to perform effectively both for the clustering and outlier detection objectives.
KW - Clustering
KW - density-based
KW - outlier detection
KW - unique closest neighbor
KW - unique neighborhood set
UR - http://www.scopus.com/inward/record.url?scp=85051657547&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2018.2857834
DO - 10.1109/ACCESS.2018.2857834
M3 - Article
AN - SCOPUS:85051657547
SN - 2169-3536
VL - 6
SP - 44707
EP - 44717
JO - IEEE Access
JF - IEEE Access
M1 - 8434383
ER -