Back to Explorer
Research PaperResearchia:202604.09034[Data Science > Statistics]

The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours

Robert Allison

Abstract

Gaussian process (GPGP) regression is a widely used non-parametric modeling tool, but its cubic complexity in the training size limits its use on massive data sets. A practical remedy is to predict using only the nearest neighbours of each test point, as in Nearest Neighbour Gaussian Process (NNGPNNGP) regression for geospatial problems and the related scalable GPnnGPnn method for more general machine-learning applications. Despite their strong empirical performance, the large-nn theory of NNGP/GPnnNNGP/GPnn remains incomplete. We develop a theoretical framework for NNGPNNGP and GPnnGPnn regression. Under mild regularity assumptions, we derive almost sure pointwise limits for three key predictive criteria: mean squared error (MSEMSE), calibration coefficient (CALCAL), and negative log-likelihood (NLLNLL). We then study the L2L_2-risk, prove universal consistency, and show that the risk attains Stone's minimax rate n2α/(2p+d)n^{-2α/(2p+d)}, where αα and pp capture regularity of the regression problem. We also prove uniform convergence of MSEMSE over compact hyper-parameter sets and show that its derivatives with respect to lengthscale, kernel scale, and noise variance vanish asymptotically, with explicit rates. This explains the observed robustness of GPnnGPnn to hyper-parameter tuning. These results provide a rigorous statistical foundation for NNGP/GPnnNNGP/GPnn as a highly scalable and principled alternative to full GPGP models.


Source: arXiv:2604.07267v1 - http://arxiv.org/abs/2604.07267v1 PDF: https://arxiv.org/pdf/2604.07267v1 Original Link: http://arxiv.org/abs/2604.07267v1

Submission:4/9/2026
Comments:0 comments
Subjects:Statistics; Data Science
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours | Researchia