Policy Updating Methods of Q Learning for Two Player Bargaining Game

Jianing Xu; Bei Zhou; Nanlin Jin

doi:10.1145/3609703.3609722

Policy Updating Methods of Q Learning for Two Player Bargaining Game

Jianing Xu, Bei Zhou, Nanlin Jin^*

^*Corresponding author for this work

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Reinforcement learning algorithms have been used to discover the strategies in game theory. This study investigates whether Q learning, one of the classic reinforcement learning methods, is capable of training bargaining players via self-play, a training paradigm used by AlphaGo, to maximum their profit. We also compare our empirical results with the known theoretic solutions and perform an comprehensive analysis upon their differences. To accomplish these, we come up with two policy updating methods used in the training process, namely alternate update and simultaneous update, which are tailored for two players who propose offers and counter-offers in an alternating manner under a time constraint enforced by the discount factors. Our experimental results have demonstrated that the values of the discount factor actually have tangible impact on how far the bargaining outcomes deviate from the game theoretic solutions.

Original language	English
Title of host publication	Proceedings - 2023 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023
Editors	Wenbing Zhao, Xinguo Yu
Publisher	Association for Computing Machinery
Pages	51-58
Number of pages	8
ISBN (Electronic)	9781450399968
DOIs	https://doi.org/10.1145/3609703.3609722
Publication status	Published - 28 Jul 2023
Event	5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023 - Virtual, Online Duration: 29 Jul 2023 → …

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023
City	Virtual, Online
Period	29/07/23 → …

Keywords

bargaining game
Q learning
self-play

Access to Document

10.1145/3609703.3609722

Cite this

Xu, J., Zhou, B., & Jin, N. (2023). Policy Updating Methods of Q Learning for Two Player Bargaining Game. In W. Zhao, & X. Yu (Eds.), Proceedings - 2023 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023 (pp. 51-58). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3609703.3609722

@inproceedings{3465f4e91ae14205960e0e2c0407ea2f,

title = "Policy Updating Methods of Q Learning for Two Player Bargaining Game",

abstract = "Reinforcement learning algorithms have been used to discover the strategies in game theory. This study investigates whether Q learning, one of the classic reinforcement learning methods, is capable of training bargaining players via self-play, a training paradigm used by AlphaGo, to maximum their profit. We also compare our empirical results with the known theoretic solutions and perform an comprehensive analysis upon their differences. To accomplish these, we come up with two policy updating methods used in the training process, namely alternate update and simultaneous update, which are tailored for two players who propose offers and counter-offers in an alternating manner under a time constraint enforced by the discount factors. Our experimental results have demonstrated that the values of the discount factor actually have tangible impact on how far the bargaining outcomes deviate from the game theoretic solutions.",

keywords = "bargaining game, Q learning, self-play",

author = "Jianing Xu and Bei Zhou and Nanlin Jin",

note = "Publisher Copyright: {\textcopyright} 2023 ACM.; 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023 ; Conference date: 29-07-2023",

year = "2023",

month = jul,

day = "28",

doi = "10.1145/3609703.3609722",

language = "English",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "51--58",

editor = "Wenbing Zhao and Xinguo Yu",

booktitle = "Proceedings - 2023 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023",

}

Xu, J, Zhou, B & Jin, N 2023, Policy Updating Methods of Q Learning for Two Player Bargaining Game. in W Zhao & X Yu (eds), Proceedings - 2023 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 51-58, 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023, Virtual, Online, 29/07/23. https://doi.org/10.1145/3609703.3609722

Policy Updating Methods of Q Learning for Two Player Bargaining Game. / Xu, Jianing; Zhou, Bei; Jin, Nanlin.
Proceedings - 2023 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023. ed. / Wenbing Zhao; Xinguo Yu. Association for Computing Machinery, 2023. p. 51-58 (ACM International Conference Proceeding Series).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Policy Updating Methods of Q Learning for Two Player Bargaining Game

AU - Xu, Jianing

AU - Zhou, Bei

AU - Jin, Nanlin

PY - 2023/7/28

Y1 - 2023/7/28

N2 - Reinforcement learning algorithms have been used to discover the strategies in game theory. This study investigates whether Q learning, one of the classic reinforcement learning methods, is capable of training bargaining players via self-play, a training paradigm used by AlphaGo, to maximum their profit. We also compare our empirical results with the known theoretic solutions and perform an comprehensive analysis upon their differences. To accomplish these, we come up with two policy updating methods used in the training process, namely alternate update and simultaneous update, which are tailored for two players who propose offers and counter-offers in an alternating manner under a time constraint enforced by the discount factors. Our experimental results have demonstrated that the values of the discount factor actually have tangible impact on how far the bargaining outcomes deviate from the game theoretic solutions.

AB - Reinforcement learning algorithms have been used to discover the strategies in game theory. This study investigates whether Q learning, one of the classic reinforcement learning methods, is capable of training bargaining players via self-play, a training paradigm used by AlphaGo, to maximum their profit. We also compare our empirical results with the known theoretic solutions and perform an comprehensive analysis upon their differences. To accomplish these, we come up with two policy updating methods used in the training process, namely alternate update and simultaneous update, which are tailored for two players who propose offers and counter-offers in an alternating manner under a time constraint enforced by the discount factors. Our experimental results have demonstrated that the values of the discount factor actually have tangible impact on how far the bargaining outcomes deviate from the game theoretic solutions.

KW - bargaining game

KW - Q learning

KW - self-play

UR - http://www.scopus.com/inward/record.url?scp=85170073769&partnerID=8YFLogxK

U2 - 10.1145/3609703.3609722

DO - 10.1145/3609703.3609722

M3 - Conference Proceeding

AN - SCOPUS:85170073769

T3 - ACM International Conference Proceeding Series

SP - 51

EP - 58

BT - Proceedings - 2023 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023

A2 - Zhao, Wenbing

A2 - Yu, Xinguo

PB - Association for Computing Machinery

T2 - 5th International Conference on Pattern Recognition and Intelligent Systems, PRIS 2023

Y2 - 29 July 2023

ER -

Policy Updating Methods of Q Learning for Two Player Bargaining Game

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this