Abstract
The itemset is a basic and usual form of data. People can obtain new insights into their business by discovering its implicit regularities through pattern mining. In some real applications, e.g., network alarm association, the itemsets usually have the following two characteristics: (1) the observed samples come from different entities, with inherent structural relationships implied in their static properties; (2) the samples are scarce, which may lead to incomplete pattern extraction. This paper considers how to efficiently find a concise set of patterns on such kind of data. Firstly, we use a graph to express the entities and their interconnections and propagate every sample to every node with a weight, determined by the pre-defined combination of kernel functions based on the similarities of the nodes and patterns. Next, the weight values can be naturally imported into the MDL-based filtering process and bring a differentiated pattern set for each node. Experiments show that the solution can outperform the global solution (trading all nodes as one) and isolated solution (removing all edges) on simulated and real data, and its effectiveness and scalability can be further verified in the application of large-scale network operation and maintenance.
Original language | English |
---|---|
Pages (from-to) | 27-35 |
Number of pages | 9 |
Journal | Neurocomputing |
Volume | 336 |
DOIs | |
Publication status | Published - 7 Apr 2019 |
Keywords
- Diffusion kernel
- Graph
- MDL
- Maximal entropy random walk
- Pattern mining