Mining top-k high average-utility itemsets based on breadth-first search

Xuan Liu; Genlang Chen; Fangyu Wu; Shiting Wen; Wanli Zuo

doi:10.1007/s10489-023-05076-4

Mining top-k high average-utility itemsets based on breadth-first search

Xuan Liu, Genlang Chen, Fangyu Wu^*, Shiting Wen, Wanli Zuo

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

Abstract

High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.

Original language	English
Pages (from-to)	29319-29337
Number of pages	19
Journal	Applied Intelligence
Volume	53
Issue number	23
DOIs	https://doi.org/10.1007/s10489-023-05076-4
Publication status	Published - Dec 2023

Keywords

Breadth-first search
Data mining
High average-utility itemset
Top-k high average-utility itemsets

Access to Document

10.1007/s10489-023-05076-4

Cite this

@article{8c46becf723e448c8938d704052dab70,

title = "Mining top-k high average-utility itemsets based on breadth-first search",

abstract = "High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.",

keywords = "Breadth-first search, Data mining, High average-utility itemset, Top-k high average-utility itemsets",

author = "Xuan Liu and Genlang Chen and Fangyu Wu and Shiting Wen and Wanli Zuo",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2023",

month = dec,

doi = "10.1007/s10489-023-05076-4",

language = "English",

volume = "53",

pages = "29319--29337",

journal = "Applied Intelligence",

issn = "0924-669X",

number = "23",

}

TY - JOUR

T1 - Mining top-k high average-utility itemsets based on breadth-first search

AU - Liu, Xuan

AU - Chen, Genlang

AU - Wu, Fangyu

AU - Wen, Shiting

AU - Zuo, Wanli

PY - 2023/12

Y1 - 2023/12

N2 - High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.

AB - High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.

KW - Breadth-first search

KW - Data mining

KW - High average-utility itemset

KW - Top-k high average-utility itemsets

UR - http://www.scopus.com/inward/record.url?scp=85174973120&partnerID=8YFLogxK

U2 - 10.1007/s10489-023-05076-4

DO - 10.1007/s10489-023-05076-4

M3 - Article

AN - SCOPUS:85174973120

SN - 0924-669X

VL - 53

SP - 29319

EP - 29337

JO - Applied Intelligence

JF - Applied Intelligence

IS - 23

ER -

Mining top-k high average-utility itemsets based on breadth-first search

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this