Longitudinal Analyses of SAST Tools: A CodeQL Case Study
Abstract
Open-source software (OSS) pipelines rely on automated static analysis tools to prevent the introduction of vulnerabilities in code. However, there is limited understanding of the efficacy of these tools across the OSS ecosystem over time. In this paper, we introduce a novel method to evaluate static application security testing (SAST) tools through longitudinal measurements and perform the largest academic study of CodeQL -- the most prevalent static analysis tool from GitHub -- on OSS codebase...
Description / Details
Open-source software (OSS) pipelines rely on automated static analysis tools to prevent the introduction of vulnerabilities in code. However, there is limited understanding of the efficacy of these tools across the OSS ecosystem over time. In this paper, we introduce a novel method to evaluate static application security testing (SAST) tools through longitudinal measurements and perform the largest academic study of CodeQL -- the most prevalent static analysis tool from GitHub -- on OSS codebases. We apply our apparatus on 114 versions of CodeQL over time on 3993 CVEs from 1622 repositories to measure key properties of the tool, culminating in more than 20 billion lines of code analyzed. First, we measure its effectiveness, i.e., its ability to detect vulnerabilities before they are fixed. Then, we determine whether these detections were actionable through two measures of the distance between findings and vulnerability location either over the entire codebase or within the vulnerable file. Finally, we study the stability of CodeQL by examining how vulnerability detections hold across versions and the evolution of CodeQL on the accuracy-precision trade-off. We find that CodeQL identifies a total of 171 CVEs, and that for 83 of them, a CodeQL version prior to the fix could detect it. Such detections are in general actionable if findings are triaged across files, as for 50% of the 171 detections, more than 50% of findings in the vulnerable file are located in the vulnerable location. Finally, we show that CVE detections are not monotonic across versions as 21 CVEs were no longer detected following a version change and 17 that were never redetected. Our study shows that using SAST tools is a matter of best practice as they prevent numerous vulnerabilities from being introduced, but that developers should be aware of changes that may leave blind spots in detections upon updates of the tool.
Source: arXiv:2605.07900v1 - http://arxiv.org/abs/2605.07900v1 PDF: https://arxiv.org/pdf/2605.07900v1 Original Link: http://arxiv.org/abs/2605.07900v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
May 11, 2026
Computer Science
Cybersecurity
0