TY - GEN
T1 - SecretHunter
T2 - 21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
AU - Wen, Elliott
AU - Wang, Jia
AU - Dietrich, Jens
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Collaborative software development platforms like GitHub have gained tremendous popularity. Unfortunately, many users have reportedly leaked authentication secrets (e.g., textual passwords and API keys) in public Git repositories and caused security incidents and finical loss. Recently, several tools were built to investigate the secret leakage in GitHub. However, these tools could only discover and scan a limited portion of files in GitHub due to platform API restrictions and band-width limitations. In this paper, we present SecretHunter, a real-time large-scale comprehensive secret scanner for GitHub. SecretHunter resolves the file discovery and retrieval difficulty via two major improvements to the Git cloning process. Firstly, our system will retrieve file metadata from repositories before cloning file contents. The early metadata access can help identify newly committed files and enable many bandwidth optimizations such as filename filtering and object deduplication. Secondly, SecretHunter adopts a reinforcement learning model to analyze file contents being downloaded and infer whether the file is sensitive. If not, the download process can be aborted to conserve bandwidth. We conduct a one-month empirical study to evaluate SecretHunter. Our results show that SecretHunter discovers 57% more leaked secrets than state-of-the-art tools. SecretHunter also reduces 85% bandwidth consumption in the object retrieval process and can be used in low-bandwidth settings (e.g., 4G connections).
AB - Collaborative software development platforms like GitHub have gained tremendous popularity. Unfortunately, many users have reportedly leaked authentication secrets (e.g., textual passwords and API keys) in public Git repositories and caused security incidents and finical loss. Recently, several tools were built to investigate the secret leakage in GitHub. However, these tools could only discover and scan a limited portion of files in GitHub due to platform API restrictions and band-width limitations. In this paper, we present SecretHunter, a real-time large-scale comprehensive secret scanner for GitHub. SecretHunter resolves the file discovery and retrieval difficulty via two major improvements to the Git cloning process. Firstly, our system will retrieve file metadata from repositories before cloning file contents. The early metadata access can help identify newly committed files and enable many bandwidth optimizations such as filename filtering and object deduplication. Secondly, SecretHunter adopts a reinforcement learning model to analyze file contents being downloaded and infer whether the file is sensitive. If not, the download process can be aborted to conserve bandwidth. We conduct a one-month empirical study to evaluate SecretHunter. Our results show that SecretHunter discovers 57% more leaked secrets than state-of-the-art tools. SecretHunter also reduces 85% bandwidth consumption in the object retrieval process and can be used in low-bandwidth settings (e.g., 4G connections).
KW - n/a
UR - http://www.scopus.com/inward/record.url?scp=85151711041&partnerID=8YFLogxK
U2 - 10.1109/TrustCom56396.2022.00028
DO - 10.1109/TrustCom56396.2022.00028
M3 - Conference Proceeding
AN - SCOPUS:85151711041
T3 - Proceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
SP - 123
EP - 130
BT - Proceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 December 2022 through 11 December 2022
ER -