Policy Updating Methods of Q Learning for Two Player Bargaining Game

Jianing Xu, Bei Zhou, Nanlin Jin*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Reinforcement learning algorithms have been used to discover the strategies in game theory. This study investigates whether Q learning, one of the classic reinforcement learning methods, is capable of training bargaining players via self-play, a training paradigm used by AlphaGo, to maximum their profit. We also compare our empirical results with the known theoretic solutions and perform an comprehensive analysis upon their differences. To accomplish these, we come up with two policy updating methods used in the training process, namely alternate update and simultaneous update, which are tailored for two players who propose offers and counter-offers in an alternating manner under a time constraint enforced by the discount factors. Our experimental results have demonstrated that the values of the discount factor actually have tangible impact on how far the bargaining outcomes deviate from the game theoretic solutions.

Original languageEnglish
Title of host publicationProceedings - 2023 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023
EditorsWenbing Zhao, Xinguo Yu
PublisherAssociation for Computing Machinery
Pages51-58
Number of pages8
ISBN (Electronic)9781450399968
DOIs
Publication statusPublished - 28 Jul 2023
Event5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023 - Virtual, Online
Duration: 29 Jul 2023 → …

Publication series

NameACM International Conference Proceeding Series

Conference

Conference5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023
CityVirtual, Online
Period29/07/23 → …

Keywords

  • bargaining game
  • Q learning
  • self-play

Fingerprint

Dive into the research topics of 'Policy Updating Methods of Q Learning for Two Player Bargaining Game'. Together they form a unique fingerprint.

Cite this