Guided policy search for sequential multitask learning

Fangzhou Xiong; Biao Sun; Xu Yang; Hong Qiao; Kaizhu Huang; Amir Hussain; Zhiyong Liu

doi:10.1109/TSMC.2018.2800040

Guided policy search for sequential multitask learning

Fangzhou Xiong, Biao Sun, Xu Yang, Hong Qiao, Kaizhu Huang, Amir Hussain, Zhiyong Liu^*

^*Corresponding author for this work

School of Advanced Technology

Research output: Contribution to journal › Article › peer-review

35 Citations (Scopus)

Abstract

Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.

Original language	English
Article number	8294227
Pages (from-to)	216-226
Number of pages	11
Journal	IEEE Transactions on Systems, Man, and Cybernetics: Systems
Volume	49
Issue number	1
DOIs	https://doi.org/10.1109/TSMC.2018.2800040
Publication status	Published - Jan 2019

Keywords

Elastic weight consolidation (EWC)
guided policy search (GPS)
reinforcement learning (RL)
sequential multitask learning

Access to Document

10.1109/TSMC.2018.2800040

Cite this

@article{f053aa65cb7b4ea68cc695c840a07b99,

title = "Guided policy search for sequential multitask learning",

abstract = "Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.",

keywords = "Elastic weight consolidation (EWC), guided policy search (GPS), reinforcement learning (RL), sequential multitask learning",

author = "Fangzhou Xiong and Biao Sun and Xu Yang and Hong Qiao and Kaizhu Huang and Amir Hussain and Zhiyong Liu",

note = "Funding Information: Manuscript received November 25, 2017; accepted January 11, 2018. Date of publication February 19, 2018; date of current version December 14, 2018. This work was supported in part by the NSFC under Grant U1613213, Grant 61375005, Grant 61503383, Grant 61210009, Grant 61627808, Grant 91648205, Grant 61702516, and Grant 61473236, in part by the National Key Research and Development Plan of China under Grant 2017YFB1300202 and Grant 2016YFC0300801, in part by the MOST under Grant 2015BAK35B00 and Grant 2015BAK35B01, in part by the Guangdong Science and Technology Department under Grant 2016B090910001, in part by the Suzhou Science and Technology Program under Grant SYG201712 and Grant SZS201613, in part by the Strategic Priority Research Program of the Chinese Academy of Science under Grant XDB02080003, in part by the Key Program Special Fund in XJTLU under Grant KSF-A-01, and in part by the U.K. Engineering and Physical Sciences Research Council under Grant EP/M026981/1. This paper was recommended by Associate Editor A. Hussain. (Corresponding author: Zhiyong Liu.) F. Xiong and X. Yang are with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Science, Beijing 100190, China, and also with the School of Computer and Control, University of Chinese Academy of Sciences, Beijing 100049, China. Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2019",

month = jan,

doi = "10.1109/TSMC.2018.2800040",

language = "English",

volume = "49",

pages = "216--226",

journal = "IEEE Transactions on Systems, Man, and Cybernetics: Systems",

issn = "2168-2216",

number = "1",

}

TY - JOUR

T1 - Guided policy search for sequential multitask learning

AU - Xiong, Fangzhou

AU - Sun, Biao

AU - Yang, Xu

AU - Qiao, Hong

AU - Huang, Kaizhu

AU - Hussain, Amir

AU - Liu, Zhiyong

N1 - Funding Information: Manuscript received November 25, 2017; accepted January 11, 2018. Date of publication February 19, 2018; date of current version December 14, 2018. This work was supported in part by the NSFC under Grant U1613213, Grant 61375005, Grant 61503383, Grant 61210009, Grant 61627808, Grant 91648205, Grant 61702516, and Grant 61473236, in part by the National Key Research and Development Plan of China under Grant 2017YFB1300202 and Grant 2016YFC0300801, in part by the MOST under Grant 2015BAK35B00 and Grant 2015BAK35B01, in part by the Guangdong Science and Technology Department under Grant 2016B090910001, in part by the Suzhou Science and Technology Program under Grant SYG201712 and Grant SZS201613, in part by the Strategic Priority Research Program of the Chinese Academy of Science under Grant XDB02080003, in part by the Key Program Special Fund in XJTLU under Grant KSF-A-01, and in part by the U.K. Engineering and Physical Sciences Research Council under Grant EP/M026981/1. This paper was recommended by Associate Editor A. Hussain. (Corresponding author: Zhiyong Liu.) F. Xiong and X. Yang are with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Science, Beijing 100190, China, and also with the School of Computer and Control, University of Chinese Academy of Sciences, Beijing 100049, China. Publisher Copyright: © 2013 IEEE.

PY - 2019/1

Y1 - 2019/1

N2 - Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.

AB - Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.

KW - Elastic weight consolidation (EWC)

KW - guided policy search (GPS)

KW - reinforcement learning (RL)

KW - sequential multitask learning

UR - http://www.scopus.com/inward/record.url?scp=85042198472&partnerID=8YFLogxK

U2 - 10.1109/TSMC.2018.2800040

DO - 10.1109/TSMC.2018.2800040

M3 - Article

AN - SCOPUS:85042198472

SN - 2168-2216

VL - 49

SP - 216

EP - 226

JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems

JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems

IS - 1

M1 - 8294227

ER -

Guided policy search for sequential multitask learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this