Unique Neighborhood Set Parameter Independent Density-Based Clustering With Outlier Detection

Md Anisur Rahman; Kenneth Li Minn Ang; Kah Phooi Seng

doi:10.1109/ACCESS.2018.2857834

Unique Neighborhood Set Parameter Independent Density-Based Clustering With Outlier Detection

Md Anisur Rahman, Kenneth Li Minn Ang, Kah Phooi Seng^*

^*Corresponding author for this work

Charles Sturt University

Research output: Contribution to journal › Article › peer-review

22 Citations (Scopus)

Abstract

Machine learning algorithms such as clustering, classification, and regression typically require a set of parameters to be provided by the user before the algorithms can perform well. In this paper, we present parameter independent density-based clustering algorithms by utilizing two novel concepts for neighborhood functions which we term as unique closest neighbor and unique neighborhood set. We discuss two derivatives of the proposed parameter independent density-based clustering (PIDC) algorithms, termed PIDC-WO and PIDC-O. PIDC-WO has been designed for data sets that do not contain explicit outliers whereas PIDC-O provides very good performance even on data sets with the presence of outliers. PIDC-O uses a two-stage processing where the first stage identifies and removes outliers before passing the records to the second stage to perform the density-based clustering. The PIDC algorithms are extensively evaluated and compared with other well-known clustering algorithms on several data sets using three cluster evaluation criteria (F-measure, entropy, and purity) used in the literature, and are shown to perform effectively both for the clustering and outlier detection objectives.

Original language	English
Article number	8434383
Pages (from-to)	44707-44717
Number of pages	11
Journal	IEEE Access
Volume	6
DOIs	https://doi.org/10.1109/ACCESS.2018.2857834
Publication status	Published - 12 Aug 2018
Externally published	Yes

Keywords

Clustering
density-based
outlier detection
unique closest neighbor
unique neighborhood set

Access to Document

10.1109/ACCESS.2018.2857834

Cite this

@article{71e7a9f7c2a44c89a70e4916d1f7a1f6,

title = "Unique Neighborhood Set Parameter Independent Density-Based Clustering With Outlier Detection",

abstract = "Machine learning algorithms such as clustering, classification, and regression typically require a set of parameters to be provided by the user before the algorithms can perform well. In this paper, we present parameter independent density-based clustering algorithms by utilizing two novel concepts for neighborhood functions which we term as unique closest neighbor and unique neighborhood set. We discuss two derivatives of the proposed parameter independent density-based clustering (PIDC) algorithms, termed PIDC-WO and PIDC-O. PIDC-WO has been designed for data sets that do not contain explicit outliers whereas PIDC-O provides very good performance even on data sets with the presence of outliers. PIDC-O uses a two-stage processing where the first stage identifies and removes outliers before passing the records to the second stage to perform the density-based clustering. The PIDC algorithms are extensively evaluated and compared with other well-known clustering algorithms on several data sets using three cluster evaluation criteria (F-measure, entropy, and purity) used in the literature, and are shown to perform effectively both for the clustering and outlier detection objectives.",

keywords = "Clustering, density-based, outlier detection, unique closest neighbor, unique neighborhood set",

author = "Rahman, {Md Anisur} and Ang, {Kenneth Li Minn} and Seng, {Kah Phooi}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2018",

month = aug,

day = "12",

doi = "10.1109/ACCESS.2018.2857834",

language = "English",

volume = "6",

pages = "44707--44717",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Unique Neighborhood Set Parameter Independent Density-Based Clustering With Outlier Detection

AU - Rahman, Md Anisur

AU - Ang, Kenneth Li Minn

AU - Seng, Kah Phooi

PY - 2018/8/12

Y1 - 2018/8/12

N2 - Machine learning algorithms such as clustering, classification, and regression typically require a set of parameters to be provided by the user before the algorithms can perform well. In this paper, we present parameter independent density-based clustering algorithms by utilizing two novel concepts for neighborhood functions which we term as unique closest neighbor and unique neighborhood set. We discuss two derivatives of the proposed parameter independent density-based clustering (PIDC) algorithms, termed PIDC-WO and PIDC-O. PIDC-WO has been designed for data sets that do not contain explicit outliers whereas PIDC-O provides very good performance even on data sets with the presence of outliers. PIDC-O uses a two-stage processing where the first stage identifies and removes outliers before passing the records to the second stage to perform the density-based clustering. The PIDC algorithms are extensively evaluated and compared with other well-known clustering algorithms on several data sets using three cluster evaluation criteria (F-measure, entropy, and purity) used in the literature, and are shown to perform effectively both for the clustering and outlier detection objectives.

AB - Machine learning algorithms such as clustering, classification, and regression typically require a set of parameters to be provided by the user before the algorithms can perform well. In this paper, we present parameter independent density-based clustering algorithms by utilizing two novel concepts for neighborhood functions which we term as unique closest neighbor and unique neighborhood set. We discuss two derivatives of the proposed parameter independent density-based clustering (PIDC) algorithms, termed PIDC-WO and PIDC-O. PIDC-WO has been designed for data sets that do not contain explicit outliers whereas PIDC-O provides very good performance even on data sets with the presence of outliers. PIDC-O uses a two-stage processing where the first stage identifies and removes outliers before passing the records to the second stage to perform the density-based clustering. The PIDC algorithms are extensively evaluated and compared with other well-known clustering algorithms on several data sets using three cluster evaluation criteria (F-measure, entropy, and purity) used in the literature, and are shown to perform effectively both for the clustering and outlier detection objectives.

KW - Clustering

KW - density-based

KW - outlier detection

KW - unique closest neighbor

KW - unique neighborhood set

UR - http://www.scopus.com/inward/record.url?scp=85051657547&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2018.2857834

DO - 10.1109/ACCESS.2018.2857834

M3 - Article

AN - SCOPUS:85051657547

SN - 2169-3536

VL - 6

SP - 44707

EP - 44717

JO - IEEE Access

JF - IEEE Access

M1 - 8434383

ER -

Unique Neighborhood Set Parameter Independent Density-Based Clustering With Outlier Detection

Abstract

Keywords

Access to Document

Other files and links

Cite this