IRDA: Incremental Reinforcement Learning for Dynamic Resource Allocation

Jia Wang; Jiannong Cao; Senzhang Wang; Zhongyu Yao; Wengen Li

doi:10.1109/TBDATA.2020.2988273

IRDA: Incremental Reinforcement Learning for Dynamic Resource Allocation

Jia Wang^*, Jiannong Cao, Senzhang Wang, Zhongyu Yao, Wengen Li

^*Corresponding author for this work

Hong Kong Polytechnic University

Research output: Contribution to journal › Article › peer-review

10 Citations (Scopus)

Abstract

Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously 'explore' and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86 percent compared to the current baggage carousel allocation system at HKIA.

Original language	English
Pages (from-to)	770-783
Number of pages	14
Journal	IEEE Transactions on Big Data
Volume	8
Issue number	3
DOIs	https://doi.org/10.1109/TBDATA.2020.2988273
Publication status	Published - 1 Jun 2022
Externally published	Yes

Keywords

Resource allocation
airport resource management
baggage handling
reinforcement learning

Access to Document

10.1109/TBDATA.2020.2988273

Cite this

@article{e351cc9f495e4dc8a345c9d5e2057b2e,

title = "IRDA: Incremental Reinforcement Learning for Dynamic Resource Allocation",

abstract = "Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously 'explore' and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86 percent compared to the current baggage carousel allocation system at HKIA.",

keywords = "Resource allocation, airport resource management, baggage handling, reinforcement learning",

author = "Jia Wang and Jiannong Cao and Senzhang Wang and Zhongyu Yao and Wengen Li",

note = "Funding Information: This work was supported byHK RGC Collaborative Research Fund (CRF)-Group Research Grant (RGCNo.C6030-18G),HK RGC Collaborative Research Fund (CRF)-Group Research Grant (RGC No.C5026-18G), Innvoation and Technology Fund (ITC No.ITP/024/18LP), and NSF of Jiangsu Province (GrantNo. BK20171420) Publisher Copyright: {\textcopyright} 2015 IEEE.",

year = "2022",

month = jun,

day = "1",

doi = "10.1109/TBDATA.2020.2988273",

language = "English",

volume = "8",

pages = "770--783",

journal = "IEEE Transactions on Big Data",

issn = "2332-7790",

number = "3",

}

TY - JOUR

T1 - IRDA

T2 - Incremental Reinforcement Learning for Dynamic Resource Allocation

AU - Wang, Jia

AU - Cao, Jiannong

AU - Wang, Senzhang

AU - Yao, Zhongyu

AU - Li, Wengen

N1 - Funding Information: This work was supported byHK RGC Collaborative Research Fund (CRF)-Group Research Grant (RGCNo.C6030-18G),HK RGC Collaborative Research Fund (CRF)-Group Research Grant (RGC No.C5026-18G), Innvoation and Technology Fund (ITC No.ITP/024/18LP), and NSF of Jiangsu Province (GrantNo. BK20171420) Publisher Copyright: © 2015 IEEE.

PY - 2022/6/1

Y1 - 2022/6/1

N2 - Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously 'explore' and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86 percent compared to the current baggage carousel allocation system at HKIA.

AB - Resource allocation problems often manifest as online decision-making tasks where the proper allocation strategy depends on the understanding of the allocation environment and resources workload. Most existing resource allocation methods are based on meticulously designed heuristics which ignore the patterns of incoming tasks, so the dynamics of incoming tasks cannot be properly handled. To address this problem, we mine the task patterns from the large volume of historical allocation data and propose a reinforcement learning model termed IRDA to learn the allocation strategy in an incremental way. We observe that historical allocation data is usually generated from the daily repeated operations, which is not independent and identically distributed. Training with partial of this dataset can make the allocation strategy converged already, thereby wasting a lot of remaining data. To improve the learning efficiency, we partition the whole historical allocation big dataset into multi-batch datasets, which forces the agent to continuously 'explore' and learn on the distinct state spaces. IRDA reuses the strategy learned from the previous batch dataset and adapts it to the learning on the next batch dataset, so as to incrementally learn from multi-batch datasets and improve the allocation strategy. We apply the proposed method to handle baggage carousel allocation at Hong Kong International Airport (HKIA). The experimental results show that IRDA is capable of incrementally learning from multi-batch datasets, and improves the baggage carousel resource utilization by around 51.86 percent compared to the current baggage carousel allocation system at HKIA.

KW - Resource allocation

KW - airport resource management

KW - baggage handling

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85130493139&partnerID=8YFLogxK

U2 - 10.1109/TBDATA.2020.2988273

DO - 10.1109/TBDATA.2020.2988273

M3 - Article

AN - SCOPUS:85130493139

SN - 2332-7790

VL - 8

SP - 770

EP - 783

JO - IEEE Transactions on Big Data

JF - IEEE Transactions on Big Data

IS - 3

ER -

IRDA: Incremental Reinforcement Learning for Dynamic Resource Allocation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this