A Closed-Form Persistence-Landmark Pipeline for Certified Point-Cloud and Graph Classification
Abstract
We introduce PLACE (Persistence-Landmark Analytic Classification Engine), a closed-form pipeline for classifying point clouds and graphs through their persistent-homology signatures. Three quantitative guarantees -- a margin-based excess-risk rate, a closed-form descriptor-selection rule, and a per-prediction certificate -- are derived from training labels alone, with no learned weights or held-out calibration. The embedding sums Mitra-Virk single-point coordinate functions over a sparse landmar...
Description / Details
We introduce PLACE (Persistence-Landmark Analytic Classification Engine), a closed-form pipeline for classifying point clouds and graphs through their persistent-homology signatures. Three quantitative guarantees -- a margin-based excess-risk rate, a closed-form descriptor-selection rule, and a per-prediction certificate -- are derived from training labels alone, with no learned weights or held-out calibration. The embedding sums Mitra-Virk single-point coordinate functions over a sparse landmark grid; closed-form weights maximize a structural distortion constant (a Lipschitz lower bound on under non-interference). (i) An margin bound, driven by class-mean separation and embedding radius , matched by a sample-starved minimax lower bound. (ii) The Mahalanobis margin under Ledoit-Wolf-shrunk covariance is the strongest closed-form descriptor selector on a heterogeneous 64-descriptor chemical-graph pool (mean Spearman across 10 benchmarks, positive on 9 of 10); the isotropic surrogate admits a closed-form selection-consistency rate on homogeneous (14-15 descriptor) protein/social pools. (iii) A training-time-decided certificate with no per-prediction overhead, in non-asymptotic Pinelis and asymptotic Gaussian plug-in forms. Empirically, PLACE is the strongest diagram-based method on Orbit5k and matches the strongest topology-based baseline within statistical noise on MUTAG and COX2. The remaining gaps fall into two diagnosable regimes: descriptor blindness on NCI1/NCI109, and pool-coverage limits elsewhere. Both radii exceed the firing threshold on every benchmark at our training-set sizes, dominated by the scaling of the multivariate-norm bound; the per-prediction certificate is constructive but not yet operational at these sizes.
Source: arXiv:2605.02836v1 - http://arxiv.org/abs/2605.02836v1 PDF: https://arxiv.org/pdf/2605.02836v1 Original Link: http://arxiv.org/abs/2605.02836v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
May 5, 2026
Data Science
Machine Learning
0