TY - JOUR
T1 - Similarity measure design for high dimensional data
AU - Lee, Sang hyuk
AU - Yan, Sun
AU - Jeong, Yoon su
AU - Shin, Seung soo
N1 - Publisher Copyright:
© 2014, Central South University Press and Springer-Verlag Berlin Heidelberg.
PY - 2014/9/1
Y1 - 2014/9/1
N2 - Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.
AB - Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.
KW - difference
KW - financial fraud
KW - high dimensional data
KW - neighborhood information
KW - similarity measure
UR - http://www.scopus.com/inward/record.url?scp=84920131235&partnerID=8YFLogxK
U2 - 10.1007/s11771-014-2333-5
DO - 10.1007/s11771-014-2333-5
M3 - Article
AN - SCOPUS:84920131235
SN - 2095-2899
VL - 21
SP - 3534
EP - 3540
JO - Journal of Central South University
JF - Journal of Central South University
IS - 9
ER -