Mining concise patterns on graph-connected itemsets

Di Zhang; Yunquan Zhang; Qiang Niu; Xingbao Qiu

doi:10.1016/j.neucom.2018.03.084

Mining concise patterns on graph-connected itemsets

Di Zhang^*, Yunquan Zhang, Qiang Niu, Xingbao Qiu

^*Corresponding author for this work

Department of Applied Mathematics

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

The itemset is a basic and usual form of data. People can obtain new insights into their business by discovering its implicit regularities through pattern mining. In some real applications, e.g., network alarm association, the itemsets usually have the following two characteristics: (1) the observed samples come from different entities, with inherent structural relationships implied in their static properties; (2) the samples are scarce, which may lead to incomplete pattern extraction. This paper considers how to efficiently find a concise set of patterns on such kind of data. Firstly, we use a graph to express the entities and their interconnections and propagate every sample to every node with a weight, determined by the pre-defined combination of kernel functions based on the similarities of the nodes and patterns. Next, the weight values can be naturally imported into the MDL-based filtering process and bring a differentiated pattern set for each node. Experiments show that the solution can outperform the global solution (trading all nodes as one) and isolated solution (removing all edges) on simulated and real data, and its effectiveness and scalability can be further verified in the application of large-scale network operation and maintenance.

Original language	English
Pages (from-to)	27-35
Number of pages	9
Journal	Neurocomputing
Volume	336
DOIs	https://doi.org/10.1016/j.neucom.2018.03.084
Publication status	Published - 7 Apr 2019

Keywords

Diffusion kernel
Graph
MDL
Maximal entropy random walk
Pattern mining

Access to Document

10.1016/j.neucom.2018.03.084

Cite this

@article{f32cc7b5a1034b34bdb6d59020f4e021,

title = "Mining concise patterns on graph-connected itemsets",

abstract = "The itemset is a basic and usual form of data. People can obtain new insights into their business by discovering its implicit regularities through pattern mining. In some real applications, e.g., network alarm association, the itemsets usually have the following two characteristics: (1) the observed samples come from different entities, with inherent structural relationships implied in their static properties; (2) the samples are scarce, which may lead to incomplete pattern extraction. This paper considers how to efficiently find a concise set of patterns on such kind of data. Firstly, we use a graph to express the entities and their interconnections and propagate every sample to every node with a weight, determined by the pre-defined combination of kernel functions based on the similarities of the nodes and patterns. Next, the weight values can be naturally imported into the MDL-based filtering process and bring a differentiated pattern set for each node. Experiments show that the solution can outperform the global solution (trading all nodes as one) and isolated solution (removing all edges) on simulated and real data, and its effectiveness and scalability can be further verified in the application of large-scale network operation and maintenance.",

keywords = "Diffusion kernel, Graph, MDL, Maximal entropy random walk, Pattern mining",

author = "Di Zhang and Yunquan Zhang and Qiang Niu and Xingbao Qiu",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier B.V.",

year = "2019",

month = apr,

day = "7",

doi = "10.1016/j.neucom.2018.03.084",

language = "English",

volume = "336",

pages = "27--35",

journal = "Neurocomputing",

issn = "0925-2312",

}

TY - JOUR

T1 - Mining concise patterns on graph-connected itemsets

AU - Zhang, Di

AU - Zhang, Yunquan

AU - Niu, Qiang

AU - Qiu, Xingbao

PY - 2019/4/7

Y1 - 2019/4/7

N2 - The itemset is a basic and usual form of data. People can obtain new insights into their business by discovering its implicit regularities through pattern mining. In some real applications, e.g., network alarm association, the itemsets usually have the following two characteristics: (1) the observed samples come from different entities, with inherent structural relationships implied in their static properties; (2) the samples are scarce, which may lead to incomplete pattern extraction. This paper considers how to efficiently find a concise set of patterns on such kind of data. Firstly, we use a graph to express the entities and their interconnections and propagate every sample to every node with a weight, determined by the pre-defined combination of kernel functions based on the similarities of the nodes and patterns. Next, the weight values can be naturally imported into the MDL-based filtering process and bring a differentiated pattern set for each node. Experiments show that the solution can outperform the global solution (trading all nodes as one) and isolated solution (removing all edges) on simulated and real data, and its effectiveness and scalability can be further verified in the application of large-scale network operation and maintenance.

AB - The itemset is a basic and usual form of data. People can obtain new insights into their business by discovering its implicit regularities through pattern mining. In some real applications, e.g., network alarm association, the itemsets usually have the following two characteristics: (1) the observed samples come from different entities, with inherent structural relationships implied in their static properties; (2) the samples are scarce, which may lead to incomplete pattern extraction. This paper considers how to efficiently find a concise set of patterns on such kind of data. Firstly, we use a graph to express the entities and their interconnections and propagate every sample to every node with a weight, determined by the pre-defined combination of kernel functions based on the similarities of the nodes and patterns. Next, the weight values can be naturally imported into the MDL-based filtering process and bring a differentiated pattern set for each node. Experiments show that the solution can outperform the global solution (trading all nodes as one) and isolated solution (removing all edges) on simulated and real data, and its effectiveness and scalability can be further verified in the application of large-scale network operation and maintenance.

KW - Diffusion kernel

KW - Graph

KW - MDL

KW - Maximal entropy random walk

KW - Pattern mining

UR - http://www.scopus.com/inward/record.url?scp=85061161450&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.03.084

DO - 10.1016/j.neucom.2018.03.084

M3 - Article

AN - SCOPUS:85061161450

SN - 0925-2312

VL - 336

SP - 27

EP - 35

JO - Neurocomputing

JF - Neurocomputing

ER -

Mining concise patterns on graph-connected itemsets

Abstract

Keywords

Access to Document

Other files and links

Cite this