Back to Explorer
Research PaperResearchia:202603.20026[Data Science > Statistics]

PPI is the Difference Estimator: Recognizing the Survey Sampling Roots of Prediction-Powered Inference

Reagan Mozer

Abstract

Prediction-powered inference (PPI) is a rapidly growing framework for combining machine learning predictions with a small set of gold-standard labels to conduct valid statistical inference. In this article, I argue that the core estimators underlying PPI are equivalent to well-established estimators from the survey sampling literature dating back to the 1970s. Specifically, the PPI estimator for a population mean is algebraically equivalent to the difference estimator of Cassel et al. (1976), and PPI plus corresponds to the generalized regression (GREG) estimator of Sarndal et al. (2003). Recognizing this equivalence, I consider what part of PPI is inherited from a long-standing literature in statistics, what part is genuinely new, and where inferential claims require care. After introducing the two frameworks and establishing their equivalence, I break down where PPI diverges from model-assisted estimation, including differences in the mode of inference, the role of the unlabeled data pool, and the consequences of differential prediction error for subgroup estimands such as the average treatment effect. I then identify what each framework offers the other: PPI researchers can draw on the survey sampling literature's well-developed theory of calibration, optimal allocation, and design-based diagnostics, while survey sampling researchers can benefit from PPI's extensions to non-standard estimands and its accessible software ecosystem. The article closes with a call for integration between these two communities, motivated by the growing use of large language models as measurement instruments in applied research.


Source: arXiv:2603.19160v1 - http://arxiv.org/abs/2603.19160v1 PDF: https://arxiv.org/pdf/2603.19160v1 Original Link: http://arxiv.org/abs/2603.19160v1

Submission:3/20/2026
Comments:0 comments
Subjects:Statistics; Data Science
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

PPI is the Difference Estimator: Recognizing the Survey Sampling Roots of Prediction-Powered Inference | Researchia