SecretHunter: A Large-scale Secret Scanner for Public Git Repositories

Elliott Wen, Jia Wang*, Jens Dietrich

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

4 Citations (Scopus)

Abstract

Collaborative software development platforms like GitHub have gained tremendous popularity. Unfortunately, many users have reportedly leaked authentication secrets (e.g., textual passwords and API keys) in public Git repositories and caused security incidents and finical loss. Recently, several tools were built to investigate the secret leakage in GitHub. However, these tools could only discover and scan a limited portion of files in GitHub due to platform API restrictions and band-width limitations. In this paper, we present SecretHunter, a real-time large-scale comprehensive secret scanner for GitHub. SecretHunter resolves the file discovery and retrieval difficulty via two major improvements to the Git cloning process. Firstly, our system will retrieve file metadata from repositories before cloning file contents. The early metadata access can help identify newly committed files and enable many bandwidth optimizations such as filename filtering and object deduplication. Secondly, SecretHunter adopts a reinforcement learning model to analyze file contents being downloaded and infer whether the file is sensitive. If not, the download process can be aborted to conserve bandwidth. We conduct a one-month empirical study to evaluate SecretHunter. Our results show that SecretHunter discovers 57% more leaked secrets than state-of-the-art tools. SecretHunter also reduces 85% bandwidth consumption in the object retrieval process and can be used in low-bandwidth settings (e.g., 4G connections).

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages123-130
Number of pages8
ISBN (Electronic)9781665494250
DOIs
Publication statusPublished - 2022
Event21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022 - Virtual, Online, China
Duration: 9 Dec 202211 Dec 2022

Publication series

NameProceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022

Conference

Conference21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
Country/TerritoryChina
CityVirtual, Online
Period9/12/2211/12/22

Keywords

  • n/a

Fingerprint

Dive into the research topics of 'SecretHunter: A Large-scale Secret Scanner for Public Git Repositories'. Together they form a unique fingerprint.

Cite this