ExplorerArtificial IntelligenceAI
Research PaperResearchia:202603.19042

Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection

Amine Lbath

Abstract

Software vulnerabilities continue to grow in volume and remain difficult to detect in practice. Although learning-based vulnerability detection has progressed, existing benchmarks are largely function-centric and fail to capture realistic, executable, interprocedural settings. Recent repo-level security benchmarks demonstrate the importance of realistic environments, but their manual curation limits scale. This doctoral research proposes an automated benchmark generator that injects realistic vu...

Submitted: March 19, 2026Subjects: AI; Artificial Intelligence

Description / Details

Software vulnerabilities continue to grow in volume and remain difficult to detect in practice. Although learning-based vulnerability detection has progressed, existing benchmarks are largely function-centric and fail to capture realistic, executable, interprocedural settings. Recent repo-level security benchmarks demonstrate the importance of realistic environments, but their manual curation limits scale. This doctoral research proposes an automated benchmark generator that injects realistic vulnerabilities into real-world repositories and synthesizes reproducible proof-of-vulnerability (PoV) exploits, enabling precisely labeled datasets for training and evaluating repo-level vulnerability detection agents. We further investigate an adversarial co-evolution loop between injection and detection agents to improve robustness under realistic constraints.


Source: arXiv:2603.17974v1 - http://arxiv.org/abs/2603.17974v1 PDF: https://arxiv.org/pdf/2603.17974v1 Original Link: http://arxiv.org/abs/2603.17974v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Mar 19, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark