TY - GEN
T1 - A Comparative Study of Q-Learning Variants for the One-to-One TSPPD
AU - Fu, Yiheng
AU - Zhang, Zihan
AU - Wen, Min
N1 - Publisher Copyright:
© 2025 Author(s).
PY - 2026/1/5
Y1 - 2026/1/5
N2 - With the rise of platforms like Meituan and JD Delivery, efficient last-mile delivery involving both pickups and deliveries has become crucial. We study the one-to-one Traveling Salesman Problem with Pickup and Delivery (TSPPD).Our primary objective is to establish the feasibility and foundational methodology of lightweight Q-learning approaches for precedence-constrained routing problems. We propose a lightweight Q-learning-based reinforcement learning framework specifically designed for one-to-one TSPPD. To the best of our knowledge, this represents the first systematic application of lightweight tabular Q-learning algorithms to precedence-constrained pickup and delivery problems. We investigate Q-learning, Double Q-learning, and introduce a novel Alternating Q-learning strategy that switches between the two methods during training to balance exploration and stability. A relocation-based local search further refines solutions while preserving feasibility. Our Alternating Q-learning demonstrates consistent performance across 17 TSPPD instances, effectively balancing fast exploration with reliable policy evaluation. Experimental validation shows that our methods achieve competitive solution quality while providing inherent interpretability and computational efficiency. For small instances, our methods often match optimal solutions, validating correctness, while for larger instances, our framework provides practical solutions within reasonable computational budgets. This work opens new research directions in interpretable logistics optimization and demonstrates the potential of Q-learning techniques for complex routing problems.
AB - With the rise of platforms like Meituan and JD Delivery, efficient last-mile delivery involving both pickups and deliveries has become crucial. We study the one-to-one Traveling Salesman Problem with Pickup and Delivery (TSPPD).Our primary objective is to establish the feasibility and foundational methodology of lightweight Q-learning approaches for precedence-constrained routing problems. We propose a lightweight Q-learning-based reinforcement learning framework specifically designed for one-to-one TSPPD. To the best of our knowledge, this represents the first systematic application of lightweight tabular Q-learning algorithms to precedence-constrained pickup and delivery problems. We investigate Q-learning, Double Q-learning, and introduce a novel Alternating Q-learning strategy that switches between the two methods during training to balance exploration and stability. A relocation-based local search further refines solutions while preserving feasibility. Our Alternating Q-learning demonstrates consistent performance across 17 TSPPD instances, effectively balancing fast exploration with reliable policy evaluation. Experimental validation shows that our methods achieve competitive solution quality while providing inherent interpretability and computational efficiency. For small instances, our methods often match optimal solutions, validating correctness, while for larger instances, our framework provides practical solutions within reasonable computational budgets. This work opens new research directions in interpretable logistics optimization and demonstrates the potential of Q-learning techniques for complex routing problems.
KW - Alternating Q-learning
KW - Double Q-learning
KW - Last-mile delivery
KW - Local search
KW - Q-learning
KW - Reinforcement learning
KW - Traveling Salesman Problem with Pickup and Delivery (TSPPD)
UR - https://www.scopus.com/pages/publications/105027395626
U2 - 10.1145/3783779.3783829
DO - 10.1145/3783779.3783829
M3 - Conference Proceeding
AN - SCOPUS:105027395626
T3 - Proceedings of 2025 3rd International Conference on Mathematics and Machine Learning, ICMML 2025
SP - 299
EP - 305
BT - Proceedings of 2025 3rd International Conference on Mathematics and Machine Learning, ICMML 2025
PB - Association for Computing Machinery, Inc
T2 - 2025 3rd International Conference on Mathematics and Machine Learning, ICMML 2025
Y2 - 14 November 2025 through 16 November 2025
ER -