TY - JOUR
T1 - IRDA
T2 - Incremental Reinforcement Learning for Dynamic Resource Allocation
AU - Wang, Jia
AU - Cao, Jiannong
AU - Wang, Senzhang
AU - Yao, Zhongyu
AU - Li, Wengen
N1 - Funding Information:
This work was supported byHK RGC Collaborative Research Fund (CRF)-Group Research Grant (RGCNo.C6030-18G),HK RGC Collaborative Research Fund (CRF)-Group Research Grant (RGC No.C5026-18G), Innvoation and Technology Fund (ITC No.ITP/024/18LP), and NSF of Jiangsu Province (GrantNo. BK20171420)
Publisher Copyright:
© 2015 IEEE.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously 'explore' and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86 percent compared to the current baggage carousel allocation system at HKIA.
AB - Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously 'explore' and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86 percent compared to the current baggage carousel allocation system at HKIA.
KW - Resource allocation
KW - airport resource management
KW - baggage handling
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85130493139&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2020.2988273
DO - 10.1109/TBDATA.2020.2988273
M3 - Article
AN - SCOPUS:85130493139
SN - 2332-7790
VL - 8
SP - 770
EP - 783
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 3
ER -