Convolutional fitted Q iteration for vision-based control problems

Dongbin Zhao, Yuanheng Zhu, Le Lv, Yaran Chen, Qichao Zhang

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

6 Citations (Scopus)

Abstract

In this paper a deep reinforcement learning (DRL) method is proposed to solve the control problem which takes raw image pixels as input states. A convolutional neural network (CNN) is used to approximate Q functions, termed as Q-CNN. A pretrained network, which is the result of a classification challenge on a vast set of natural images, initializes the parameters of Q-CNN. Such initialization assigns Q-CNN with the features of image representation, so it is more concentrated on the control tasks. The weights are tuned under the scheme of fitted Q iteration (FQI), which is an offline reinforcement learning method with the stable convergence property. To demonstrate the performance, a modified Food-Poison problem is simulated. The agent determines its movements based on its forward view. In the end the algorithm successfully learns a satisfied policy which has better performance than the results of previous researches.

Original languageEnglish
Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4539-4544
Number of pages6
ISBN (Electronic)9781509006199
DOIs
Publication statusPublished - 31 Oct 2016
Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
Duration: 24 Jul 201629 Jul 2016

Publication series

NameProceedings of the International Joint Conference on Neural Networks
Volume2016-October

Conference

Conference2016 International Joint Conference on Neural Networks, IJCNN 2016
Country/TerritoryCanada
CityVancouver
Period24/07/1629/07/16

Keywords

  • Convolutional neural network
  • Deep reinforcement learning
  • Fitted Q iteration
  • Vision-based control

Fingerprint

Dive into the research topics of 'Convolutional fitted Q iteration for vision-based control problems'. Together they form a unique fingerprint.

Cite this