TY - GEN
T1 - Leveraging Large Language Models for Challenge Solving in Capture-the-Flag
AU - Zou, Yuwen
AU - Hong, Yang
AU - Xu, Jingyi
AU - Liu, Lekun
AU - Fan, Wenjun
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs' intrinsic capabilities. We evaluate the agent's performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs' performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.
AB - Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs' intrinsic capabilities. We evaluate the agent's performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs' performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.
KW - Capture the Flag
KW - Large Language Models
KW - LLM Agent
KW - Penetration Testing
UR - http://www.scopus.com/inward/record.url?scp=105006593451&partnerID=8YFLogxK
U2 - 10.1109/TrustCom63139.2024.00213
DO - 10.1109/TrustCom63139.2024.00213
M3 - Conference Proceeding
AN - SCOPUS:105006593451
T3 - Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom
SP - 1541
EP - 1550
BT - 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom)
T2 - 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024
Y2 - 17 December 2024 through 21 December 2024
ER -