Skip to main navigation Skip to search Skip to main content

A Comparative Study of Q-Learning Variants for the One-to-One TSPPD

  • Yiheng Fu
  • , Zihan Zhang
  • , Min Wen*
  • *Corresponding author for this work
  • Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

With the rise of platforms like Meituan and JD Delivery, efficient last-mile delivery involving both pickups and deliveries has become crucial. We study the one-to-one Traveling Salesman Problem with Pickup and Delivery (TSPPD).Our primary objective is to establish the feasibility and foundational methodology of lightweight Q-learning approaches for precedence-constrained routing problems. We propose a lightweight Q-learning-based reinforcement learning framework specifically designed for one-to-one TSPPD. To the best of our knowledge, this represents the first systematic application of lightweight tabular Q-learning algorithms to precedence-constrained pickup and delivery problems. We investigate Q-learning, Double Q-learning, and introduce a novel Alternating Q-learning strategy that switches between the two methods during training to balance exploration and stability. A relocation-based local search further refines solutions while preserving feasibility. Our Alternating Q-learning demonstrates consistent performance across 17 TSPPD instances, effectively balancing fast exploration with reliable policy evaluation. Experimental validation shows that our methods achieve competitive solution quality while providing inherent interpretability and computational efficiency. For small instances, our methods often match optimal solutions, validating correctness, while for larger instances, our framework provides practical solutions within reasonable computational budgets. This work opens new research directions in interpretable logistics optimization and demonstrates the potential of Q-learning techniques for complex routing problems.

Original languageEnglish
Title of host publicationProceedings of 2025 3rd International Conference on Mathematics and Machine Learning, ICMML 2025
PublisherAssociation for Computing Machinery, Inc
Pages299-305
Number of pages7
ISBN (Electronic)9798400720932
DOIs
Publication statusPublished - 5 Jan 2026
Event2025 3rd International Conference on Mathematics and Machine Learning, ICMML 2025 - Nanjing, China
Duration: 14 Nov 202516 Nov 2025

Publication series

NameProceedings of 2025 3rd International Conference on Mathematics and Machine Learning, ICMML 2025

Conference

Conference2025 3rd International Conference on Mathematics and Machine Learning, ICMML 2025
Country/TerritoryChina
CityNanjing
Period14/11/2516/11/25

Keywords

  • Alternating Q-learning
  • Double Q-learning
  • Last-mile delivery
  • Local search
  • Q-learning
  • Reinforcement learning
  • Traveling Salesman Problem with Pickup and Delivery (TSPPD)

Cite this