Leveraging Large Language Models for Challenge Solving in Capture-the-Flag

Yuwen Zou, Yang Hong, Jingyi Xu, Lekun Liu, Wenjun Fan*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs' intrinsic capabilities. We evaluate the agent's performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs' performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.

Original languageEnglish
Title of host publication23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrsutCom)
Pages1541-1550
Number of pages10
Edition2024
DOIs
Publication statusPublished - 2024
Event23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024 - Sanya, China
Duration: 17 Dec 202421 Dec 2024

Publication series

NameProceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom
ISSN (Print)2324-898X

Conference

Conference23rd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2024
Country/TerritoryChina
CitySanya
Period17/12/2421/12/24

Keywords

  • Capture the Flag
  • Large Language Models
  • LLM Agent
  • Penetration Testing

Fingerprint

Dive into the research topics of 'Leveraging Large Language Models for Challenge Solving in Capture-the-Flag'. Together they form a unique fingerprint.

Cite this