TY - GEN
T1 - A RAG-Assisted DRL Framework for Microservices Deployment in 6G Vehicular Networks
AU - Ayepah-Mensah, Daniel
AU - Ghebreziabiher, Amine Kidane
AU - Boateng, Gordon Owusu
AU - Mizouni, Rabeb
AU - Mourad, Azzam
AU - Otrok, Hadi
AU - Bentahar, Jamal
AU - Muhaidat, Sami
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Modern edge cloud platforms must efficiently deploy and route containerized microservice DAGs under strict latency and cost constraints, while adapting to rapidly changing workloads and infrastructure states. Deep Reinforcement Learning (DRL) schedulers adapt well to dynamics but often lack semantic awareness of service intent and task dependencies, resulting in suboptimal decisions in unseen scenarios. To overcome these limitations, we introduce a Retrieval-Augmented Generation-assisted DRL (RAG-DRL) framework that integrates a lightweight DRL agent with a graph-based RAG module powered by a partially frozen LLM. A dynamic memory graph encodes contextual information such as node resources, network latencies, and SLA feedback. The LLM retrieves relevant historical deployments and current service intents to generate soft placement plans and reward estimates, which guide the DRL agent. These priors accelerate convergence, improve generalization across diverse conditions, and ensure real-time responsiveness. Evaluations on a realistic urban-scale edge cloud testbed confirm that RAG-DRL significantly reduces SLA violations, end-to-end latency, and resource imbalance, outperforming modern container-based schedulers. Our framework converges faster, maintains latency below 65 ms on scale, limits SLA violations to 12% under heavy load, and achieves 90 % resource utilization with balanced distribution.
AB - Modern edge cloud platforms must efficiently deploy and route containerized microservice DAGs under strict latency and cost constraints, while adapting to rapidly changing workloads and infrastructure states. Deep Reinforcement Learning (DRL) schedulers adapt well to dynamics but often lack semantic awareness of service intent and task dependencies, resulting in suboptimal decisions in unseen scenarios. To overcome these limitations, we introduce a Retrieval-Augmented Generation-assisted DRL (RAG-DRL) framework that integrates a lightweight DRL agent with a graph-based RAG module powered by a partially frozen LLM. A dynamic memory graph encodes contextual information such as node resources, network latencies, and SLA feedback. The LLM retrieves relevant historical deployments and current service intents to generate soft placement plans and reward estimates, which guide the DRL agent. These priors accelerate convergence, improve generalization across diverse conditions, and ensure real-time responsiveness. Evaluations on a realistic urban-scale edge cloud testbed confirm that RAG-DRL significantly reduces SLA violations, end-to-end latency, and resource imbalance, outperforming modern container-based schedulers. Our framework converges faster, maintains latency below 65 ms on scale, limits SLA violations to 12% under heavy load, and achieves 90 % resource utilization with balanced distribution.
KW - Deep Reinforcement Learning
KW - Edge-Cloud Orchestration
KW - Large Language Models (LLMs)
KW - Microservice Deployment
KW - Retrieval-Augmented Generation (RAG)
UR - https://www.scopus.com/pages/publications/105029900367
U2 - 10.1109/WiMob66857.2025.11257559
DO - 10.1109/WiMob66857.2025.11257559
M3 - Conference Proceeding
AN - SCOPUS:105029900367
T3 - International Conference on Wireless and Mobile Computing, Networking and Communications
BT - 2025 21st International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2025
PB - IEEE Computer Society
T2 - 21st International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2025
Y2 - 20 October 2025 through 22 October 2025
ER -