ExplorerArtificial IntelligenceAI
Research PaperResearchia:202604.23013

Coverage, Not Averages: Semantic Stratification for Trustworthy Retrieval Evaluation

Andrew Klearman

Abstract

Retrieval quality is the primary bottleneck for accuracy and robustness in retrieval-augmented generation (RAG). Current evaluation relies on heuristically constructed query sets, which introduce a hidden intrinsic bias. We formalize retrieval evaluation as a statistical estimation problem, showing that metric reliability is fundamentally limited by the evaluation-set construction. We further introduce \emph{semantic stratification}, which grounds evaluation in corpus structure by organizing doc...

Submitted: April 23, 2026Subjects: AI; Artificial Intelligence

Description / Details

Retrieval quality is the primary bottleneck for accuracy and robustness in retrieval-augmented generation (RAG). Current evaluation relies on heuristically constructed query sets, which introduce a hidden intrinsic bias. We formalize retrieval evaluation as a statistical estimation problem, showing that metric reliability is fundamentally limited by the evaluation-set construction. We further introduce \emph{semantic stratification}, which grounds evaluation in corpus structure by organizing documents into an interpretable global space of entity-based clusters and systematically generating queries for missing strata. This yields (1) formal semantic coverage guarantees across retrieval regimes and (2) interpretable visibility into retrieval failure modes. Experiments across multiple benchmarks and retrieval methods validate our framework. The results expose systematic coverage gaps, identify structural signals that explain variance in retrieval performance, and show that stratified evaluation yields more stable and transparent assessments while supporting more trustworthy decision-making than aggregate metrics.


Source: arXiv:2604.20763v1 - http://arxiv.org/abs/2604.20763v1 PDF: https://arxiv.org/pdf/2604.20763v1 Original Link: http://arxiv.org/abs/2604.20763v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 23, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
Coverage, Not Averages: Semantic Stratification for Trustworthy Retrieval Evaluation | Researchia