Leveraging Large Language Models for Challenge Solving in Capture-the-Flag

Yuwen Zou; Yang Hong; Jingyi Xu; Lekun Liu; Wenjun Fan

doi:10.1109/TrustCom63139.2024.00213

Leveraging Large Language Models for Challenge Solving in Capture-the-Flag

Yuwen Zou, Yang Hong, Jingyi Xu, Lekun Liu, Wenjun Fan^*

^*Corresponding author for this work

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

1 Citation (Scopus)

Abstract

Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs' intrinsic capabilities. We evaluate the agent's performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs' performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.

Original language	English
Title of host publication	23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom)
Pages	1541-1550
Number of pages	10
Edition	2024
DOIs	https://doi.org/10.1109/TrustCom63139.2024.00213
Publication status	Published - 2024
Event	23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024 - Sanya, China Duration: 17 Dec 2024 → 21 Dec 2024

Publication series

Name	Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom
ISSN (Print)	2324-898X

Conference

Conference	23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024
Country/Territory	China
City	Sanya
Period	17/12/24 → 21/12/24

Keywords

Capture the Flag
Large Language Models
LLM Agent
Penetration Testing

Access to Document

10.1109/TrustCom63139.2024.00213

Cite this

Zou, Y., Hong, Y., Xu, J., Liu, L., & Fan, W. (2024). Leveraging Large Language Models for Challenge Solving in Capture-the-Flag. In 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom) (2024 ed., pp. 1541-1550). (Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom). https://doi.org/10.1109/TrustCom63139.2024.00213

Zou, Yuwen ; Hong, Yang ; Xu, Jingyi et al. / Leveraging Large Language Models for Challenge Solving in Capture-the-Flag. 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom). 2024. ed. 2024. pp. 1541-1550 (Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom).

@inproceedings{038cb7f8a17a4a5eb7dbfd534e749d05,

title = "Leveraging Large Language Models for Challenge Solving in Capture-the-Flag",

abstract = "Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs' intrinsic capabilities. We evaluate the agent's performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs' performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.",

keywords = "Capture the Flag, Large Language Models, LLM Agent, Penetration Testing",

author = "Yuwen Zou and Yang Hong and Jingyi Xu and Lekun Liu and Wenjun Fan",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024 ; Conference date: 17-12-2024 Through 21-12-2024",

year = "2024",

doi = "10.1109/TrustCom63139.2024.00213",

language = "English",

series = "Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom",

pages = "1541--1550",

booktitle = "23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom)",

edition = "2024",

}

Zou, Y, Hong, Y, Xu, J, Liu, L & Fan, W 2024, Leveraging Large Language Models for Challenge Solving in Capture-the-Flag. in 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom). 2024 edn, Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom, pp. 1541-1550, 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024, Sanya, China, 17/12/24. https://doi.org/10.1109/TrustCom63139.2024.00213

Leveraging Large Language Models for Challenge Solving in Capture-the-Flag. / Zou, Yuwen; Hong, Yang; Xu, Jingyi et al.
23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom). 2024. ed. 2024. p. 1541-1550 (Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Leveraging Large Language Models for Challenge Solving in Capture-the-Flag

AU - Zou, Yuwen

AU - Hong, Yang

AU - Xu, Jingyi

AU - Liu, Lekun

AU - Fan, Wenjun

PY - 2024

Y1 - 2024

N2 - Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs' intrinsic capabilities. We evaluate the agent's performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs' performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.

AB - Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs' intrinsic capabilities. We evaluate the agent's performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs' performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.

KW - Capture the Flag

KW - Large Language Models

KW - LLM Agent

KW - Penetration Testing

UR - http://www.scopus.com/inward/record.url?scp=105006593451&partnerID=8YFLogxK

U2 - 10.1109/TrustCom63139.2024.00213

DO - 10.1109/TrustCom63139.2024.00213

M3 - Conference Proceeding

AN - SCOPUS:105006593451

T3 - Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom

SP - 1541

EP - 1550

BT - 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom)

T2 - 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024

Y2 - 17 December 2024 through 21 December 2024

ER -

Zou Y, Hong Y, Xu J, Liu L, Fan W. Leveraging Large Language Models for Challenge Solving in Capture-the-Flag. In 23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom). 2024 ed. 2024. p. 1541-1550. (Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom). doi: 10.1109/TrustCom63139.2024.00213

Leveraging Large Language Models for Challenge Solving in Capture-the-Flag

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this