ExplorerArtificial IntelligenceAI
Research PaperResearchia:202605.26012

Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

Rustem Takhanov

Abstract

Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$. It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is penalized by the square of its native space norm. This method is of interest because it can be viewe...

Submitted: May 26, 2026Subjects: AI; Artificial Intelligence

Description / Details

Conditionally positive definite (CPD) kernels are defined with respect to a function class F\mathcal{F}. It is well known that such a kernel KK is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is penalized by the square of its native space norm. This method is of interest because it can be viewed as classical linear regression, with features specified by F\mathcal{F}, followed by the application of standard KRR to the residual (unexplained) component of the target variable. Methods of this type have recently attracted increasing attention. We study the statistical properties of this method by reducing its behavior to that of KRR with another fixed kernel, called the residual kernel. Our main theoretical result shows that such a reduction is indeed possible, at the cost of an additional term in the expected test risk, bounded by O(1/N)\mathcal{O}(1/\sqrt{N}), where NN is the sample size and the hidden constant depends on the class F\mathcal{F} and the input distribution. This reduction enables us to analyze conditional KRR in the case where KK is positive definite and F\mathcal{F} is given by the first kk principal eigenfunctions in the Mercer decomposition of KK. We also consider the setting where F\mathcal{F} consists of kk random features from a random feature representation of KK. It turns out that these two settings are closely related. Both our theoretical analysis and experiments confirm that conditional KRR outperforms standard KRR in these cases whenever the F\mathcal{F}-component of the regression function is more pronounced than the residual part.


Source: arXiv:2605.26067v1 - http://arxiv.org/abs/2605.26067v1 PDF: https://arxiv.org/pdf/2605.26067v1 Original Link: http://arxiv.org/abs/2605.26067v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 26, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark